Twitter’s human-powered search is the present and future of AI

Twitter has done a lot to change the way people share news, watch live TV and even how they protest. Along the way it has had to deal with scaling problems as well as the challenge of tracking how users connect to each other and how to then instantly — or at least very quickly — deliver the right tweets to the right streams.

To do this, it has built some impressive tools such as Scalding, Storm and FlockDB. Now as it attempts to sell ads and make money operating its popular service, it faces a problem that has vexed people who deal with computers for decades. How do you get a machine to understand what people are saying or typing? Not just recognizing the words (although in speech recognition programs that can still be tough) but to understand that when they are talking about “cancer,” they might be referring to a disease or an astrological sign.

Many people view this as a problem for machines — another step in the path to true artificial intelligence — and have created elaborate efforts to solve the problem of translating human thought into something a machine can parse. On Twitter, where an inane comment during a presidential debate can result in a #bindersfullofwomen hashtag, how does a computer recognize what that means and what type of ads to then place against it?

The answer is that it doesn’t. So Twitter instead turns to real people, via an automated process it described in a blog post published Tuesday. It has essentially automated a query to Amazon’s Mechanical Turk service which bids out jobs to real people.

From the Twitter post:

Suppose that our Storm topology has detected that the query [Big Bird] is suddenly spiking. Since the query may remain popular for only a few hours, we send it off to live humans, who can help us quickly understand what it means; this dispatch is performed via a Thrift service that allows us to design our tasks in a web frontend, and later programmatically submit them to Mechanical Turk using any of the different languages we use across Twitter.

On Mechanical Turk, judges are asked several questions about the query that help us serve better ads.

There has been a lot of effort to mimic the human brain in computers, but perhaps the most optimal way to take advantage of people is to recognize what they do well and find cheap ways to optimize our brain’s computational powers — not via replication in silicon, but by using computers to outsource the task to the most appropriate, cheapest, nearest or whatever people. Twitter has done this, but it’s not alone.

Douglas Merrill of ZestCast at Structure:Data 2012

(c) 2012 Pinar Ozger. pinar@pinarozger.com

For example ZestFinance, a company using data analysis to offer credit to people, tweaked its credit scoring model to include humans to help determine what variables might really matter in a particular person’s scoring model. All told, about 25 percent of the variables the company analyzes are the result of human intervention, but it’s the mix of humans and the existing data analytics that make the combination so powerful.

Another example of this combination is Gravity Labs, which uses people plus machine learning to construct interest graphs. When you combine people with better databases, faster computers or task-optimized systems like Siri or Watson you have a more realistic version of artificial intelligence than some self-learning and thinking robot. It also drives home one of the more subtle aspects of the big data revolution that my colleague Derrick Harris pointed out earlier this month. Data analytics in many cases is more about automation than insights.

Some of the best uses of advanced databases or data visualizations is in narrowing down what might thousands or millions of variables into something that can assessed by a person and then acted on. In Twitter’s case the computers can handle the recognition of a hashtag’s spiking popularity, but a quick call to a person can tell you why and what that hashtag means far more quickly and cheaply than a computer could. But a person can’t filter through billions of tweets to see what’s spiking and what isn’t.

Thus perhaps the most practical AI — for now — is recognizing what humans can do, and getting them the best, most compact information that allows them to make their decisions. Why replicate the brain if you don’t have to?


GigaOM