On July 2nd at 6:47 AM MT, TrackVia Support reported an increase in support tickets related to application scripts. A quick investigation attributed the issue to a recent software release. Reverting the application change corrected the issue and full functionality was restored by 8:00 AM MT. Full timeline detailed below.
A number of accounts encountered failed transactions corresponding error messages during the incident timeline. No data corruption was identified as a result of the incident.
Scope | Value |
---|---|
Count of Accounts Impacted | 84 |
Count of Failed Transactions | 3212 |
% Total Traffic Impacted | 0.04% |
A new application script threading model introduced in the 23.43 release was designed to manage app script compilation and execution in a single-threaded manner, per application server. This design was intended to mitigate a long-standing issue. As part of the architectural changes, we also provided a fallback mechanism through an application configuration value that would allow us to utilize the legacy threading logic should any unexpected errors occur.
The ultimate root cause is still under investigation. We believe we are running into a contention scenario related to a singleton value shared across application server instances and threads.
TrackVia Operations added the config parameter, forcing the threading logic to revert to legacy handling routines, which remained unchanged in the codebase.
New application servers were deployed utilizing the configuration override, restoring full service.