A researcher at MIT claims to have developed an algorithm that can accurately predict what topics will trend on Twitter. But Twitter being a relatively minor business in the grand scheme of things, the algorithm might end up being more useful elsewhere, predicting stock prices, ticket sales and other dynamically changing quantities.
According to a release from the MIT News Office, Associate Professor David Shah says his model has been 95 percent accurate during testing and has been predicting trends hours before they appear on Twitter’s list. The algorithm incorporates a new approach to machine learning that compares real-time data with historical data and predicts outcomes based on past events that most closely align with the current situation. So, rather than analyzing a topic’s chances of trending equally against the entire historical corpus of topics, it will assign more weight to topics whose paths followed similar trajectories up the ranks of top trends.
And Twitter is certainly interested in the research. A company spokesperson emailed me to point out that Shah’s graduate research assistant, Stanislav Nikolov, is a Twitter employee.
However, the algorithm’s level of accuracy and speed would have to translate to a much-larger and more-complex stage — Twitter’s real-life firehose and stockpile of historical tweets — if the company were to use its predictions to charge premiums for ads associated with certain topics, as Shah suggests. Advertisers might not be happy to pay premium rates for topics that fizzle out before ever becoming top trends (although a tiered rate system based on the model’s confidence or, perhaps, projected ranking among top trends could work). Thus far, the algorithm has been trained using a set of 400 topics, half of which trended and half of which did not.
Shah thinks it’s a great fit for Twitter data because the data is relatively clean and he has found a strong correlation between past and future activity. Other historical data sets might be more messy or have more noise than does Twitter’s data set, which would make it much more difficult to filter out extraneous data and discern the real factors that lead to a particular result. However, even Twitter has presented research showing, in the case of its search engine at least, how the sheer volume of data it receives and the speed at which it comes in can make it difficult to accurately predict what someone wants to see.
The good news, though, for anyone willing to give Shah’s algorithm a try is that it’s designed to process data in parallel across scale-out systems like those used by large web companies. Therefore, training it and then running it in production across a voluminous data set won’t run into the same obstacles traditionally faced by machine learning algorithms as data sizes increase. And there are potentially more lucrative and rewarding endeavors that could benefit from this type of predictive power: Shah suggests stock markets, movie ticket sales and public transportation as possibilities, but others might include combating cybercrime by identifying threats earlier or predicting the severity of disease outbreaks.
Feature image courtesy of Shutterstock user turtleteeth.