Ongoing Connectivity Issues at Netherlands Data Center
Incident Report for Networks Status
Postmortem

During routine operations, we encountered a complete failure of one of our primary edge routers. This was not a typical component failure like a line card or management card, which are designed with redundancy in mind. Regrettably, the entire chassis became non-operational due to an ASIC (Application-Specific Integrated Circuit) malfunction.

The initial symptoms were characterized by significant packet loss. In response, we switched to an alternate management card; however, this did not rectify the issue. Further attempts were made to resolve the problem through a cold reload (power removal), which unfortunately resulted in the ASIC failing to initialize.

Addressing this unprecedented challenge required us to migrate over 200 100GigE ports to an alternative switch. Our team worked diligently and exhaustively to mitigate the impact and restore full functionality as swiftly as possible. I am pleased to report that our network is now stable and operational once again.

We understand the critical nature of network reliability for our clients, and I extend our deepest apologies for any inconvenience this incident may have caused. Although this situation was a rare and unfortunate event, I assure you that our team's efforts were intensely focused on rectifying the issue as quickly as humanly possible. The scale and complexity of this incident, however, meant that an immediate resolution was challenging.

We appreciate your understanding and patience during this unexpected outage. Our commitment to providing reliable, high-quality service remains as strong as ever, and we are taking steps to further enhance our system's resilience against such rare occurrences.

Thank you for your continued trust and support. If you have any further questions or concerns, please do not hesitate to reach out.

Posted Oct 27, 2023 - 19:12 PDT

Resolved
During routine operations, we encountered a complete failure of one of our primary edge routers. This was not a typical component failure like a line card or management card, which are designed with redundancy in mind. Regrettably, the entire chassis became non-operational due to an ASIC (Application-Specific Integrated Circuit) malfunction.

The initial symptoms were characterized by significant packet loss. In response, we switched to an alternate management card; however, this did not rectify the issue. Further attempts were made to resolve the problem through a cold reload (power removal), which unfortunately resulted in the ASIC failing to initialize.

Addressing this unprecedented challenge required us to migrate over 200 100GigE ports to an alternative switch. Our team worked diligently and exhaustively to mitigate the impact and restore full functionality as swiftly as possible. I am pleased to report that our network is now stable and operational once again.

We understand the critical nature of network reliability for our clients, and I extend our deepest apologies for any inconvenience this incident may have caused. Although this situation was a rare and unfortunate event, I assure you that our team's efforts were intensely focused on rectifying the issue as quickly as humanly possible. The scale and complexity of this incident, however, meant that an immediate resolution was challenging.

We appreciate your understanding and patience during this unexpected outage. Our commitment to providing reliable, high-quality service remains as strong as ever, and we are taking steps to further enhance our system's resilience against such rare occurrences.

Thank you for your continued trust and support. If you have any further questions or concerns, please do not hesitate to reach out.
Posted Oct 27, 2023 - 19:11 PDT
Investigating
We are currently addressing an unexpected connectivity issue at our Netherlands offshore data center. This is impacting a number of our services due to non-responsive racks. Our technical team is on-site working diligently to resolve the issue. At this moment, we do not have an exact timeframe for when services will be fully restored, but we are committed to resolving this as quickly as possible. Updates will be provided as more information becomes available. We appreciate your patience and understanding during this time.
Posted Oct 27, 2023 - 15:54 PDT
This incident affected: Netherlands.