Netflix reveals the secrets of its big data analysis
OTT broadcaster Netflix has taken time away from its spat with US telco Verizon over alleged traffic throttling to discuss how it uses big data for more than just viewer recommendations.
The firm is expanding rapidly, delivering over 1 billion hours of streaming per month to 48 million users in more than 40 countries. A far cry from its original business model of sending DVDs through the post. According to Sandvine, it accounts for 34 per cent of peak Internet traffic in the US – which is why it is often at loggerheads with telcos and ISPs.
Writing on the company’s blog this week, Nirmal Govind, director of streaming science and algorithms for Netflix, says that he uses big data for deep analysis and predictive algorithms not just for movie recommendations but also for the streaming quality of experience (QoE). This refers to the user experience once he or she presses play on a Netflix video.
It switched its architecture to its own Open Connect CDN to optimise streaming quality, and is currently putting together a new team to focus on what Govind calls “streaming science”. This involves creating a personalised streaming experience for each user, determining what videos to cache on edge servers based on viewing behaviour, and improving the technical quality of the content.
Netflix intends to create a mapping function that can quantify and predict how changes in QoE metrics affect user behaviour, so that they can tailor the algorithms that determine QoE and improve aspects that have significant impact on users’ viewing experience.
With vast amounts of data, the mapping function discussed above can be used to further improve the experience for our members at the aggregate level, and even personalize the streaming experience based on what the function might look like based on each member's "QoE preference." Personalization can also be based on a member's network characteristics, device, location, etc. For example, a member with a high-bandwidth connection on a home network could have very different expectations and experience compared to a member with low bandwidth on a mobile device on a cellular network.
Another set of big data problems exists on the content delivery side. Open Connect is Netflix's own content delivery network that allows ISPs to directly connect to Netflix servers at common internet exchanges, or place a Netflix-provided storage appliance (cache) with Netflix content on it at ISP locations. By locating content closer to users, with fewer network hops, the viewing experience should be better.
“With millions of members, a large catalogue, and limited storage capacity, how should the content be cached to ensure that when a member plays a particular movie or show, it is being served out of the local cache/appliance?” asks Govand.
Other problems concern quality of the content itself. It augments its own quality checks with user feedback, which is unstructured and difficult to easily process. Which is why its building data models to help identify quality issues.
“Machine learning models along with natural language processing and text mining techniques can be used to build powerful models to improve the quality of content that goes live,” said Govind. “As we expand internationally, this problem becomes even more challenging with the addition of new movies and shows to our catalogue and the increase in number of languages.”
Netflix vs Verizon
Returning to the spat with Verizon for a moment. An interesting post from Dan Rayburn suggests that Netflix’s argument doesn’t stack up, and that without transparent data it’s hard to prove that Verizon degraded Netflix’s performance by congesting their peer points.
“CDNs like Akamai, Limelight and Level 3 successfully managed the majority of all of Netflix’s video and were responsible for Netflix customer performance, and successfully delivered Netflix via all the same transit paths and business relationships equally available to Netflix today,” said Rayburn. “When Netflix took over the routing controls for their video traffic with their own CDN Open Connect, customer performance began to suffer.”
There’s a supporting graphic on his site that demonstrates this, using Netflix s own data that it shared with the Washington Post.
And if you want to learn more about streaming video and peering, this is recommended weekend reading.