An algorithm for tracking viruses (and Twitter rumors) to their source

No, Vanilla Ice isn’t dead — and if he had access to a new algorithm from Swiss researcher Pedro Pinto, the Ice Man could go all techno-ninja and track down who started the rumor claiming he was. That’s because Pinto and his colleagues at the Ecole Polytechnique Fédérale de Lausanne have developed an algorithm for finding the source of such rumors, as well as viruses (physical and digital) and other maladies, even across highly complex networks.

Their method, according to an abstract of a paper just published in Physical Review Letters, is ideal for situations where there is relatively little data to work with, and is “based on the principles used by telecommunication towers to pinpoint cell phone users.” Essentially, the algorithm starts by looking at a small collection of points within a network and working back from there to determine the origin, kind of like how investigators can zero in on a cell phone’s location using triangulation. The more connections, or observers, a particular point has, the fewer that are needed to track down the source point.

Tracking a cholera outbreak that spread over a river network.

Aside from tracking the spread of web rumors, the team also successfully tested the algorithm against a cholera outbreak in South Africa (analyzing its spread across both water and human networks) and the 9/11 attacks in the United States. Both times, it was able to identify the sources (among a small list of possibilities, at least) using only a fraction of the publicly available data on those events. Thankfully for Vanilla Ice and others concerned with the spread of information over the web, Pinto’s system has an easier time with that type of data because it’s usually time-stamped, which makes it easier to figure out who was “infected” first.

In an article from Ecole Polytechnique Fédérale de Lausanne describing the research, Pinto explains that his teams method could also be used for everything from identifying the source of a computer virus to determining the blogs most likely to make web content go viral to preventing the spread of an epidemic or chemical attack by learning how it’s spreading.

With so much research coming out to analyze data around crime, disease and other perils, it will be interesting to see the results when the work makes it way out of the lab and into the real world. Death rumors on social media are often times just good fun, but using data science to stop the spread of an epidemic would really be something. Hopefully, public health, law enforcement and other officials are keeping up with the tools now at their disposal.



GigaOM