Investigating issues with web application

Resolved
Updated

Post-mortem

On Thursday at 14:33, we were alerted that our app was down. By 14:35, our internal systems confirmed the issue. We quickly identified that the web server had reached its connection limit, which we increased to restore service.

Further investigation revealed a significant uptick in traffic, suggesting a potential DDoS attack. We also noted disk space issues due to excessive logging. As traffic continued to rise, we suspected a malicious actor and contacted our hosting provider for insights and assistance.

By 15:15, we restricted traffic to known customer IPs, stabilizing the system. We continued adjusting our rate limits and analyzing the attack. At 16:00, we tested reopening the firewall but found that the attack was still ongoing. Consequently, we reverted to restricting traffic to known IPs, although this created issues for users logging in from new IPs.

In total the system was unavailable for 45 minutes, for known users. For users utilizing not previously known IP addresses, the downtime was extended, but this impacted a very limited amount of client and agents.

We are now focused on improving our DDoS prevention and alert systems. Our investigation confirmed no data breaches; the attack aimed solely to disrupt service availability. If similar attacks occur, we are ready to implement immediate measures to minimize impact.

Thank you for your patience and understanding as we work to enhance our system's resilience.

Resolved

We have now opened traffic for the general public and can confirm the DDoS attack has stopped. We have made changes to our rate limiting and will be closely monitoring and adjusting accordingly.

Thank you for your patience.

Updated

We have identified the root cause to be a coordinated DDoS attack on our systems. To minimize the threat we have blocked all network traffic from unknown IP addresses.

Updates will follow while we work on possible solutions to open the network traffic for all IP addresses again.

We deeply apologize for any inconvenience this may cause.

Identified

We've confirmed there is a problem, we're working to resolve it.

Resolved

The service has been restarted and is now fully operational! We are now monitoring the service to make sure the problem will not persist. We apologize for any inconvenience.

Investigating

We’re currently investigating reports of a potential service interruption with our web application. We apologize for any inconvenience and will post another update as soon as we learn more.

Service interruptions usually only lasts for a few minutes.

Began at:

Affected components
  • App
    • Web Application (HTTPS)