Content Farms 2.0: Can Robots Help Write the News?

In the media industry right now, there are few more terrifying words than “content farm” — a term used to refer to sites that thrive by producing vast amounts of cheap, disposable content. From a traditional media business perspective, everything about content farms seems terrible: Low-paid staff churn out low-quality content that’s often devoid of useful information, and typically aimed not at enchanting readers but at scooping as many of Google’s advertising dollars as possible.

It’s big money: Demand Media went public recently, and AOL’s leading the charge with its low cost Seed.com service and, arguably, its $ 315 million buyout of the Huffington Post. Journalists roll their eyes at this stuff, but content farms — for better or worse — are hard to ignore right now, driving down prices and flooding the market with stories.

But what if you took the content farm model even further — by crowdsourcing everything? That’s the objective of My Boss Is a Robot, an experimental exercise to see how automated the process of journalism can become. It’s being overseen by two San Francisco-based journalists, Jim Giles and MacGregor Campbell, who want to understand what automation and crowdsourcing mean for journalism in the long term.

Specifically, the duo want to see if they can “create an automated system for producing quality journalism using Mechanical Turk’s army of untrained workers.” This involves breaking down the familiar job of journalism — in this case, producing a 500-word report on a relatively straightforward scientific paper — into little pieces, and then getting those pieces completed through Amazon’s Mechanical Turk service.

“We were interested in what you might call crowdsourcing 2.0,” Giles — who has written for outlets including New Scientist, the Economist and the New York Times — told me. “Rather than do simple, one-off tasks on Mechanical Turk, could you break down a big task into lots of small ones? We kind of feel like people are close to doing it already.”

Working alongside a team from Carnegie Mellon, they are trying to develop systems to test whether it’s possible. One job might include reading a few paragraphs of information to determine the most important fact about a research paper. Another would be identifying researchers who might be interested in similar areas; then the most appropriate ones; then emailing them asking for comment.
Of course, it’s all theoretical right now — an exploration of what is possible, even if it seems improbable — but there’s a precedent.

Aside from actual content farms, CMU’s Niki Kittur — who is leading the technical side of the process — has already done work to automate the writing of Wikipedia entries by breaking them down into 30 or so tasks that can produce passable content for around $ 3 a time. Giles suggests it will take anything up to 100 tasks, potentially producing work that is “at least an order of magnitude cheaper than a professional journalist.”

Despite the fact that it’s an experiment that’s meant to comment on the troubles of content farming as much as the ills of journalism, Giles says he’s already had some negative responses from reporters who are unsettled by the prospect or worry that it will “put them out of a job.” Nobody knows whether it will work or not, but it’s almost certain to hold a mirror up to both sides of the content farm argument: saying something about the mindless and exploitative nature of low cost content, while also commenting on the humdrum and (yes) robotic nature of so-called “churnalism.”

“I am scared of the way people some might use it — Demand Media have already automated some parts of the journalism process, for example,” he says. “But if you imagine that it works, and what we’re producing is passable, then you’ve got to start asking yourself questions. If it’s as good as those other 500-word stories, then what does that say about them? If it can be automated, then someone is going to do it — so perhaps we should concentrate on doing something better.”

Of course, there are caveats. There’s lots of trial and error, and they are only looking at simple, uncontroversial reporting. There are also likely to be plenty of pitfalls along the way, and as such, Giles is looking for feedback on the My Boss Is a Robot blog or via Twitter so that he can document reactions and understand what challenges the scheme poses. But the real proof of the pudding will come in a couple of months, when the system is ready to run for real — and can be compared against a real live journalist. “When we get to the end, we’ll do a blind test,” he says. “And then I’ll receive my notice the next day.”

Related GigaOM Pro content (sub req’d):

  • Google Needs to Fix Its Spam Problem, Even If It Hurts
  • Demand Media: Search Spam or the Future of Content?
  • What We Can Learn From the Guardian’s Open Platform


GigaOMTech