Diffbot helps apps read the web like humans

Diffbot, a Palo Alto, Calif. start-up, is trying help developers build apps that read the web like humans. What that means is that Diffbot’s technology uses visual learning robotics and artificial intelligence to view content on the web visually, instead of parsing the underlying HTML code. It sounds a little geeky but what it does is provide a way for developers and publishers to analyze the web and organize it in ways that are very easy to digest for people, especially mobile users.

The company, which was just opened up its first APIs to the public, is encouraging developers to start using it for a variety of applications. What might these apps look like? Well, AOL joined a private beta earlier this year and is now using Diffbot to help personalize its Editions news reader iPad app, by grabbing content and pulling out the top news stories from other sources. Diffbot is able to look at a content source and determine what kind of page it is, what the elements are such as headlines, images, advertisements and contextually understand the content on the page. It can determine what the top story is on a news site.

That’s helpful for news aggregators in determining what to present and creating a clean experience. Co-founder Michael Tung told me that he’s talking to tablet news aggregator apps that compete with Editions and I’m guessing some will look at implementing Diffbot’s API. These companies are often doing this kind of work themselves but will now have an option in Diffbot.

But it’s not just news apps that can use Diffbot. Nuance uses the Diffbot API to build large domain-specific text corpuses to train its natural language processing system to recognize speech more accurately in specific areas like medicine. Hacker News Radio uses Diffbot to take the top Hacker News stories and turn them into text-to-speech content for an online radio station. You can see some other apps that use Diffbot’s API here.

Tung said Diffbot can really be useful in mobile applications, which have limited screen real estate and can use extra intelligence to help present content.

“We have hundreds of beta developers and a lot of them are working on mobile apps,” he said. “Web pages don’t look very good on mobile devices, they’re designed for large screens. A lot of the apps are using us because that want to incorporate web data and display it in a better way on mobile devices or do something custom.”

Diffbot is releasing two APIs: There’s an On-Demand API that comes looks at two types of pages, front pages and articles. The Frontpage API analyzes home pages and index things like headlines, bylines, images, articles, ads while the Article API can extract article text, pictures, and tags from news pages. A second main API called Follow can be used to follow any changes or updates made to a web page and pulls out the pertinent data that is useful. Developers will get 50,000 API calls per month for free on a self serve basis while a cloud plan allocates 100,000 free calls a month and then is $ 0.002 per call after that.

Tung said there are about 30 different types of page types on the web and Diffbot will be opening up APIs for those over time, including profile pages, event pages and product pages. As those become available, expect more developers to give Diffbot a try. Developers can grab more data from a variety of web sources and put them together into some interesting apps that should be useful for consumers.

Diffbot was built by Tung and Leith Abdulla, two Stanford students who were able to land funding from Stanford’s venture fund, SSE ventures. Tung said the company is profitable and is in the process of expanding beyond its five people.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.