Partial System Outage in Private Cloud
Incident Report for TrackVia
Postmortem

Outage Report - Private Cloud Routers

May 10, 2022

Incident summary

TrackVia services were unavailable for the one private cloud customer from 7:00 to 7:39AM MT on May 10,2022.  An investigation revealed all routers were disabled preventing the customer accessing the site. Operations personnel enabled the routers and full system access was restored by 7:40 AM MT. No Customer data was lost during this outage.

Root cause

TrackVia personnel released new routers as part of TrackVia’s vulnerability management program. During the deployment, TrackVia operations personnel inadvertently disabled the routers processing traffic.

Lessons learned

  1. Due to client’s internet traffic restriction, TrackVia is unable to utilize the external uptime monitoring system.
  2. Disabling routers from load balancer is a manual step to the procedure and has no technical enforcement check preventing incorrect instances from being disabled.

Corrective actions

  1. Add additional automation to router deployment   (TBD)
  2. Add a sanity check to verify routers are not active in the load balancer prior to disabling. (TBD)
  3. Add Internal monitoring for the absence of Router traffic. (In-Progress)
Posted May 10, 2022 - 13:34 MDT

Resolved
TrackVia services were unavailable for the one private cloud customer from 7:00 to 7:39AM MT on May 10,2022. An investigation revealed all network routers were disabled preventing the customer accessing the site. Operations personnel enabled the routers and full system access was restored by 7:40 AM MT. No customer data was lost during this outage.
Posted May 10, 2022 - 07:00 MDT