Incident ID: AMS1-6f44sjkbj8hx
Date: December 17, 2025
Duration: 5 hours 55 minutes (06:27 – 12:22 Amsterdam Time)
EXECUTIVE SUMMARY
On December 17, 2025, starting at 06:27 Amsterdam Time, UnderHost experienced a significant network outage affecting multiple racks in our Amsterdam (AMS-EQ01) datacenter. The incident involved multiple hardware failures across different infrastructure layers, requiring emergency hardware replacement and configuration migration. All services were fully restored by 12:22 Amsterdam Time.
WHAT CUSTOMERS EXPERIENCED
Customers with services located in the affected racks experienced partial connectivity loss, including:
TIMELINE OF EVENTS
06:27 – Initial outage detected; multiple racks reported connectivity issues
06:45 – Engineering team engaged; initial diagnosis pointed to network infrastructure
07:15 – Root cause identified: faulty PSU in a shared top-of-rack switch chassis
07:30 – Emergency hardware replacement initiated; affected PSU and management module replaced
08:00 – During recovery validation, an additional issue was discovered in the edge router infrastructure
08:30 – Decision made to accelerate the planned Juniper MX10K migration
09:15 – Configuration adaptation and validation for the new Juniper platform
10:45 – Final row-by-row validation completed
12:22 – All services fully restored
ROOT CAUSE ANALYSIS
This incident involved two distinct but related hardware failures:
WHY RESTORATION TOOK 5 HOURS 55 MINUTES
Several factors contributed to the extended recovery time:
CORRECTIVE ACTIONS IMPLEMENTED
Immediate (Completed):
✓ Replacement of the faulty Arista ToR switch PSU and management module
✓ Emergency migration to the Juniper MX10K edge router platform
✓ Full validation of all network paths and routing tables
✓ Enhanced monitoring alerts for power supply and backplane health metrics
Short-Term (Next 30 Days):
Long-Term (Network Roadmap):
COMMUNICATION REVIEW
We acknowledge that initial communications focused primarily on the immediate top-of-rack switch failure while the broader edge router migration was in progress. Going forward, we will:
LESSONS LEARNED
APPRECIATION
We sincerely apologize for the disruption this incident caused to your operations. Our engineering team worked continuously and with the highest priority to restore services as quickly and safely as possible. We appreciate your patience and understanding during this event.
CONTACT
If you have any questions regarding this incident or observe any lingering issues, please contact our support team.
Sincerely,
The UnderHost Engineering & Operations Team