We are prepared to provide a detailed post-mortem report regarding a service disruption that impacted Swapcard customers on Wednesday, October 18th, 2023. During this incident, we encountered intermittent 504 gateway timeout errors on the Developer API.
The purpose of this post-mortem is to share insights into our initial assessment of the situation, as communicated on the Swapcard status page, and to outline the corrective actions we've taken to restore normal service.
On Wednesday, October 18th, at approximately 4 PM UTC, we observed a surge in 504 gateway timeout errors on the Developer API. This issue affected various external system integrations, excluding those provided by the Studio. Please note that the impact on affected customers may have varied in duration and severity.
After conducting a thorough investigation, it was determined that the problem stemmed from a connectivity issue within our primary developer gateway. This issue led to routing problems, resulting in only one-third of the HTTP requests made during that period reaching the appropriate backend Developer APIs. Our Swapcard Response Team, in collaboration with other departments, identified and resolved the connectivity issue within approximately one hour from the initial report.
The service interruption was promptly addressed as the network connectivity between the developer gateway and related backends was restored. Our Swapcard Incident Response team acted swiftly to mitigate the impact on our customers. This incident highlighted areas where we can make improvements to enable faster diagnosis of connectivity issues, network congestion, or related problems.
This incident has underscored the potential for enhancements in our processes and controls. While we already have established procedures in place, we acknowledge the opportunity for improvement. This proactive approach ensures that we continually strengthen the resilience of our systems and minimize the potential for disruptions.