Outage Report - Private Cloud Routers
May 10, 2022
Incident summary
TrackVia services were unavailable for the one private cloud customer from 7:00 to 7:39AM MT on May 10,2022. An investigation revealed all routers were disabled preventing the customer accessing the site. Operations personnel enabled the routers and full system access was restored by 7:40 AM MT. No Customer data was lost during this outage.
Root cause
TrackVia personnel released new routers as part of TrackVia’s vulnerability management program. During the deployment, TrackVia operations personnel inadvertently disabled the routers processing traffic.
Lessons learned
- Due to client’s internet traffic restriction, TrackVia is unable to utilize the external uptime monitoring system.
- Disabling routers from load balancer is a manual step to the procedure and has no technical enforcement check preventing incorrect instances from being disabled.
Corrective actions
- Add additional automation to router deployment (TBD)
- Add a sanity check to verify routers are not active in the load balancer prior to disabling. (TBD)
- Add Internal monitoring for the absence of Router traffic. (In-Progress)