Galaxy Interface is not accessible
Incident Report for Meteor Cloud
Postmortem

Between Oct 18th and Oct 21, Galaxy’s UI interface and deployment functionality were sporadically impaired, for as long as 1.5 hours at a time. During this time period, some users were not able to deploy new applications, deploy updates to existing applications, or load the interface that showed the status of their applications. Already running applications were not affected.

We have identified the root cause underlying the issue. A code release that added more instrumentation negatively impacted the Galaxy management application’s performance. We identified the problematic code and rolled out the fixes on Friday, Oct 21st.

Going forward, we have identified improvements to monitoring to catch these performance degradations earlier. We have also improved our performance analysis tooling to reduce the identification time in case of a similar event. We apologize for the service interruption and we are following up on these issues to ensure Galaxy's uptime improves in the future.

Posted Oct 24, 2016 - 19:15 EDT

Resolved
We have fixed the bug that has caused several Galaxy management interface and deploy server outages this week and deployed it to production.
Posted Oct 21, 2016 - 17:51 EDT
Update
We have identified the root cause that has caused this service degradation.
Posted Oct 21, 2016 - 16:52 EDT
Identified
We are working on the fix.
Posted Oct 21, 2016 - 16:49 EDT