Summary
On September 17, Shepherd experienced an outage from 4:28 PM ET until 5:01 PM ET. During this time, users were unable to access the application. Service was restored at 3:01 PM and has remained stable since.
What Happened
The outage was triggered by a sudden spike in database requests. Under normal conditions, Shepherd processes around 150 requests at a given time. On September 17, that number surged to over 1,000 requests within just a few minutes, overwhelming system resources and causing the application to become unavailable.
While the exact trigger for the surge is still under investigation, it appears to be tied to existing queries unexpectedly consuming far more resources than usual, and was not due to a recent release.
What We Have Implemented
What’s Next
Although the precise root cause of the request surge remains under investigation, immediate safeguards are in place to detect problems earlier, contain them faster, and minimize the risk of another system-wide outage.
Thank you for your continued trust, patience, and partnership.
— The Shepherd Team