Kumu Blog

Tools and practices for tackling complex systems.

An update on Kumu's recent downtime

If you were building maps on Monday morning you likely experienced slow load times, and even periods where Kumu wouldn't load at all. We apologize for this and know it can be both frustrating and scary, especially when you're faced with presenting a project that won't load. We take Kumu's reliability seriously and wanted to share the results of our investigation along with what we've done to mitigate against this in the future.

What caused the downtime?

No, Ryan's children did not get a hold of his computer and take turns banging on the servers. A handful of Kumu maps with large images were embedded on high traffic news sites that ended up dramatically increasing the traffic hitting our servers. For an hour or two the traffic was high and constant enough to max out the number of connections our app servers can handle. No data was lost, but response times and timeouts were unacceptably high during this time.

Despite what you may have been thinking, this is not actually what happened.

What are you doing to prevent this in the future?

After identifying the cause, we took a few immediate steps to protect against future embed-induced traffic spikes:

The first step was to add additional servers to help handle the immediate requests. Although this didn't address the root cause, it did buy us some time.

The second step was to disable the image proxy for embeds. Proxying images places a heavy burden on our servers and the only thing it gives us is the ability to capture screenshots. Since screenshots aren't needed for embeds there's no need to proxy the images. Disabling the proxy significantly lowered the amount of traffic our servers had to handle to serve the embeds.

The third and final step was to redesign the way embeds work so they can be hosted on a CDN (content delivery network -- we're using Amazon's Cloudfront if you're curious). All embed traffic is now handled by Cloudfront with zero impact on our servers and core app traffic. In addition, embeds are now cached hourly, meaning changes to the underlying map driving an embed will take one hour to show up on the embed link. The addition of this caching layer is the key to preventing high-traffic embeds from affecting Kumu's uptime and also has the added benefit of making embeds load much quicker.

We can't thank you enough for all your patience and support as we worked through these issues on Monday. If you have maps that seem to be affected by the downtime that you would like assistance with, please email us at support@kumu.io and we'll be happy to help.

- Team Kumu

Subscribe for updates

Loading comments...