How P2P and big data could save the set-top box

Netflix dominates the business of streaming Hollywood movies and television into consumers’ homes, but a new business model developed by big data firm Opera Solutions could help give cable companies the inside track. They’ve been under fire from services, such as Netflix, Roku and Hulu that stream content to existing devices such as gaming consoles, or straight to users’ computers, but cable companies’ set-top boxes could end up as their saviors.

Thanks to a combination of peer-to-peer networking and big data algortihms, cable or satellite providers could significantly reduce the costs of serving content while also improving the accuracy of personalized recommendation — and it relies on the omnipresence of set-top boxes.

Opera employees working under the team name Media Moguls developed the model as part of the company’s annual Opera Open challenge, in which employees compete to create the best product that utilizes the company’s analytics technology. Media Moguls won the competition.

How it works

According to team member Carmine Mangione, chief software architect at Opera, the problem facing most providers of streaming video is one of choice versus costs. While they might want to provide the largest libraries possible, that’s a very expensive proposition. In order to provide an effective streaming experience and provide features such as rewinding, Mangione explained, content needs to be served from memory rather than on slower, but cheaper, hard disks. On average, a single server could effectively serve about 100 people watching the same movie at the same time.

That’s not too bad a ratio if a provider is hosting a small libary, but Mangione said it gets exponentially more expensive as providers try to address the long tail of subscriber interest. More content doesn’t just mean adding more servers, but it means adding enough servers to maintain performance levels even in the case of unexpected traffic, thus drastically reducing the number of viewers each server actually supports.

In an email, Andrew Grant, Akamai’s senior manager for entertainment industry marketing, described the problem to me as in this manner:

For VOD content, as content libraries get larger – i.e. As more and more of the ‘long-tail’ of video content is made available for consumers to enjoy – the system must be architected to support a projected average amount of streaming per title, and then over-provisioned to support the unpredictable traffic associated with an outlying event. Think when Michael Jackson died and suddenly traffic for old concert footage suddenly spiked. Building a system to support this level of ‘flash crowd’ traffic would require leaving a large amount of server capacity under-utilized a large part of the time.

By Mangione’s math, a 100,000-movie library would cost about $ 1.4 billion a year to operate, as opposed t0 about $ 10,000 for a 10,000-movie library. Akamai’s Grant noted that his company’s CDN (which is one of several that serve Netflix’s streaming service) is built in part to adjust to such spikes by adding capacity as needed. Akamai’s services, of course, aren’t free either.

Mangione realized that cable providers could actually save themselves infrastructure costs and add more content by leveraging the CPU and memory present in their set-top boxes. By creating a P2P network of subscribers’ boxes, providers wouldn’t have to build as much infrastructure because their servers would get hit a lot less. Once subscribers downloaded movies, their set-top boxes and Internet connections would handle most of the subsequent downloads from other subscribers.

But there’s a problem with such a simple approach: movie files are large, and P2P networks can be slow because traffic might have to traverse multiple network hops. Even if they offer fewer titles, streaming services such as Netflix might look more appealing because their use of CDNs allows them to deliver content so much faster.

Enter big data

The solution to this problem, Mangione decided, is to determine subscribers’ particular interests and then create geographically logical clusters of subscribers who share the same interests. The benefits to this approach are twofold:

  1. It decreases the chances of having to hit a provider’s central server because the chances are higher that someone (or multiple people) in a user’s cluster has that title.
  2. It improves the accuracy of recommendations because the system primarily analyzes what people with the same interests are viewing and doesn’t have to filter through as much noise from the overall subscriber base.

Such private networks scale to about 10,000 users, Mangione said.

Aside from classifying users into groups based on what they watch, Media Moguls captain and Opera principal scientist Frank Williams noted that the system also could determine the genre of movies more accurately based on who’s watching them. Ideally, the algorithms would maximize P2P traffic and minimize server load by accurately pointing subscribers to new content that they want but don’t have, and that’s on other set-top boxes in their clusters.

Opera does have some clout when it comes to talking about recommendation algorithms: a team that included several Opera employees came in second place in the Netflix Prize competition in 2009, although it and the winning team finished in a statistical tie.

Is it viable?

It is, of course, up to cable or satellite providers to determine the viability of Opera’s model and implement it, but the signs point to that happening. For one, Opera CEO Arnab Gupta said his company’s plan is to commercialize the technology through a media-industry partner that already has the data, infrastructure and distribution plans in place. Ideally, that would be a cable or satellite provider (Dish Network is ramping up its streaming service), but conceivably could be an Internet-only device that has adequate hardware specs.

And although set-top boxes might be relics of the past as new boxless viewing methods come into existence, mainstream ubiquity could be a way off. Adopting the Opera method would let providers leverage their existing networks of set-top boxes in the meantime, and its relevancy might well carry into future viewing methods. Whether customers are streaming content from other devices (e.g., game consoles, Roku, etc.) or straight through their TVs, providers still have to worry about the backend costs of storing and delivering all that content.

In any case, licensing content is also a hurdle, but all the hubbub about the Stop Online Piracy and Protect-IP Acts might end up proving to be a blessing in disguise. If the public outcry against these bills is enough to kill them in the House and Senate, it might well serve as a signal to content owners to focus on new, legal distribution channels rather than on lawsuits. Something like this, done through existing cable or satellite partners, could be one good option.

Feature image courtesy of Flickr user PascalSijen; other images courtest of Opera Solutions.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

  • Dissecting the data: 5 issues for our digital future
  • Connected world: the consumer technology revolution
  • What Amazon’s new Kindle line means for Apple, Netflix and online media



GigaOM