Defending Westeros from Performance Outages

Jul 14, 2017

Posted by Chris Colosimo

On Sunday, the Game of Thrones season 7 premiere will become available for millions of rabid fans. Last season’s finale was watched by over 8.9 million people (a record for HBO), and apparently if you take into account all of the different types of media formats (streaming, live, DVR, and reruns), that single episode had about 23 million viewers in total.

The season 7 premiere has been highly anticipated and hyped, as the show continues to build and plotlines are coming to a culmination. Will John Snow be the king of the North? What’s going on with Dany and that Army across the sea? And most importantly, will Tyrion Lanister finally get the respect that he deserves? (I don’t know, maybe I’m biased, but he’s definitely the best thing about that show…)

But I have this little twinge in the back of my mind when I think about 23 million viewers simultaneously trying to access the same show from HBO. What kind of infrastructure did they put in place? And, more importantly, how did they performance test the scenario? What if we were all 30 minutes into the episode and suddenly there was an outage?

Come at me, Performance Testers!

I personally watch Game of Thrones on the HBO Go app, streaming to my TV, so I thought I might do some research to see what’s under the hood in this particular use case.

Here’s what I found out about the HBO Go app:

  • It’s primarily Java-based
  • Uses Cassandra as its primary data source
  • Communicates back home via REST APIs
  • Posts content for streaming on Amazon EC2 in a partnership with MLB Advanced.

According to the swells of internet knowledge.. HBO originally planned the streaming service in-house under codename Project Maui. This was around 2014, but they had some challenges, including some outages that occurred during episodes of GoT and True Detective. (It was alleged at the time that some Dev associates knew about a possible memory leak but chalked it up as a non-issue, and it was unfortunately determined that those leaks eventually led to the outages.)

Nonetheless, following these issues, HBO partnered with MLB Advanced, and now they do the streaming. So what would it look like to adequately performance test that infrastructure, to ensure they don’t have a catastrophic outage on Sunday?

Here is a diagram modeling what I would guess the stack looks like, and where the appropriate test types would fit in:

Parasoft Environment Manager meets Game of Thrones

I would start by recreating the login, search, and selection workflows. I would test the REST API calls that would be made from the application server to services, and I would validate functional calls to the backend databases. I would then reuse those calls to create performance and load tests against the individual components, to make sure that they would perform in isolation and would not suffer from oversaturation. I would then record the user’s experience from both the mobile application and browser experience, and repurpose those for performance tests. It would become important to aggregate those two results together while monitoring the underlying tech for Threads, Memory leaks, CPU utilization etc. This would help them to understand where the potential hotspots lie in the application stack.

Conclusion?

Performance testing sometimes gets overlooked, but it’s easy with the right solutions. Neglect this and you could have a pack of angry wildlings coming after you when their stream suddenly stops.

New Call-to-action