Tuesday, February 5, 2013

SPS Sync Outage


Problem:
The outage today on the sync site resulted from an unanticipated failure mode.  The primary sync web server had an issue connecting to some of its application files resulting in application errors.  Our load balancing system did not detect the site as being down so it did not redirect traffic to an alternate server.

Resolution:
Internal checks were added so that if this type of error occurs in the future, the site will be taken offline to allow the traffic to be redirected properly.