Partial Outage of the “Waterfall” Node

Resolved

Degraded performance

Started over 3 years agoLasted 33 minutes

Updates

Resolved
31/01/2023 at 7:20 AM
Resolved
31/01/2023 at 7:20 AM
The issue has been resolved — the node is replying to requests normally.

To remove any potential future occurrence of that issue, a notification system for anomalies has been added and tested to the infrastructure stack.
Monitoring
31/01/2023 at 7:05 AM
Monitoring
31/01/2023 at 7:05 AM
The fix has been implemented (reloading the node stack).
Identified
31/01/2023 at 7:02 AM
Identified
31/01/2023 at 7:02 AM
The abnormal CPU and disk usage has been identified at the Waterfall node.

The root cause is declared to be a MariaDB process that leaked memory bit-by-bit over the past days leading to no RAM being left for other processes.
Investigating
31/01/2023 at 6:47 AM
Investigating
31/01/2023 at 6:47 AM
We are currently investigating this incident.
As of now, seems like the Waterfall node is unresponsive to some requests.

cmld. - Partial Outage of the “Waterfall” Node – Incident details