Defending Westeros from Performance Outages
On Sunday, the Game of Thrones season 7 premiere will become available for millions of rabid fans. Last season’s finale was watched by over 8.9 million people (a record for HBO), and apparently if you take into account all of the different types of media formats (streaming, live, DVR, and reruns), that single episode had about 23 million viewers in total.
The season 7 premiere has been highly anticipated and hyped, as the show continues to build and plotlines are coming to a culmination. Will John Snow be the king of the North? What’s going on with Dany and that Army across the sea? And most importantly, will Tyrion Lanister finally get the respect that he deserves? (I don’t know, maybe I’m biased, but he’s definitely the best thing about that show…)
But I have this little twinge in the back of my mind when I think about 23 million viewers simultaneously trying to access the same show from HBO. What kind of infrastructure did they put in place? And, more importantly, how did they performance test the scenario? What if we were all 30 minutes into the episode and suddenly there was an outage?
I personally watch Game of Thrones on the HBO Go app, streaming to my TV, so I thought I might do some research to see what’s under the hood in this particular use case.
Here’s what I found out about the HBO Go app:
- It’s primarily Java-based
- Uses Cassandra as its primary data source
- Communicates back home via REST APIs
- Posts content for streaming on Amazon EC2 in a partnership with MLB Advanced.
According to the swells of internet knowledge.. HBO originally planned the streaming service in-house under codename Project Maui. This was around 2014, but they had some challenges, including some outages that occurred during episodes of Game of Thrones and True Detective. (It was alleged at the time that some Dev associates knew about a possible memory leak but chalked it up as a non-issue, and it was unfortunately determined that those leaks eventually led to the outages.)
To put the importance of these types of issues in context, HBO's CTO resigned in the wake of the Project Maui issues, much of which were related to these memory leaks that lead to outages. The failure of their own, custom-made platform lead to wholesale change in direction for the platform team resulting in a move to a third party solution. One can't underestimate the impact of quality, performance and security in mission-critical infrastructure.
Nonetheless, following these issues, HBO partnered with MLB Advanced, and now they do the streaming. So what would it look like to adequately performance test that infrastructure, to ensure they don’t have a catastrophic outage on Sunday?
Here is a diagram modeling what I would guess the stack looks like, and where the appropriate test types would fit in:
I would start by recreating the login, search, and selection workflows. I would test the REST API calls that would be made from the application server to services, and I would validate functional calls to the backend databases. I would then reuse those calls to create performance and load tests against the individual components, to make sure that they would perform in isolation and would not suffer from oversaturation. I would then record the user’s experience from both the mobile application and browser experience, and repurpose those for performance tests. It would become important to aggregate those two results together while monitoring the underlying tech for Threads, Memory leaks, CPU utilization etc. This would help them to understand where the potential hotspots lie in the application stack.
Performance testing sometimes gets overlooked, but it’s easy with the right solutions. Neglect this and you could have a pack of angry wildlings coming after you when their stream suddenly stops.
If you want to copy what I did in that diagram, to leverage an environments-based approach to testing, you can use Parasoft SOAtest for your functional test automation, and Parasoft Virtualize for service virtualization. These technologies connect seamlessly to make your testing easy to manage.
A Product Manager at Parasoft, Chris strategizes product development of Parasoft’s functional testing solutions. His expertise in SDLC acceleration through automation has taken him to major enterprise deployments, such as Capital One and CareFirst.