Topsy’s search index now includes every tweet since Twitter’s birth in 2006

For something that was once criticized as a nerd’s plaything, Twitter has grown and evolved to become a critical part of not just the social web but also the real-time news ecosystem, thanks in part to events like the Arab Spring. It’s a potential treasure trove of interesting sociological data, but so far researchers have had to make do with small snapshots of that massive archive. Until now, that is: On Wednesday, the social-search engine Topsy announced that its free search index now includes every tweet dating back to the birth of Twitter in 2006.

The company doesn’t say exactly how many tweets that is, but it’s likely in the neighborhood of 300 billion: when the Library of Congress cut a deal with Twitter a couple of years ago to archive all the public tweets from 2006 to 2010, the index contained about 170 billion updates, and that number was increasing at a rate from 140 million a day in 2011 to about 500 million a day in late 2012, which would put the index at around 300 billion.

Topsy says its search database contains more than 475 billion tweets, videos, images, pins and other social data elements included by Twitter users in their updates. The company also sorts tweets based on its own proprietary sentiment analysis and influence-ranking measures, and says that it can infer location data for more than 95 percent of tweets (only about one percent of tweets are tagged with location by their authors, according to the company).

Screen Shot 2013-09-04 at 9.04.40 AM

Although Twitter is still only used by a relatively small proportion of the world’s population, searching for specific keywords and phrases in an archive of hundreds of billions of tweets can produce some interesting patterns, particularly when that search is concentrated on a key term in a specific area: for example, researchers from Harvard looking at the aftermath of the earthquake in Haiti found Twitter could identify areas with cholera problems much faster than traditional methods.

Other researchers have shown that tracking the usage of certain terms in Twitter can provide insight into real-time events such as the Arab Spring uprisings in Egypt and Tunisia, and some have speculated that being able to search or monitor keywords during sudden disasters like an earthquake or the bombings in Boston might allow aid workers and emergency personnel to respond much faster to potential danger. Some Wall Street investment firms have also shown an interest in tracking stock-market trends by using Twitter social data.

Post and thumbnail photos courtesy of Shutterstock / Gunnar Pippel

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

  • How search can unlock the power of big data
  • How social discovery is transforming entertainment
  • The importance of putting the U and I in visualization


GigaOM