Has Ayasdi turned machine learning into a magic bullet?

A couple months ago, I had a chance to see — and I mean really see – what might become a common first step in answering questions and curing diseases that have been plaguing us for centuries. It was a demonstration by the founders of Ayasdi, a Palo Alto, Calif., startup that has developed software for visually mapping the hidden connections in massive datasets. The company launched publicly on Wednesday and brought with it $ 10.25 million in Series A funding from Khosla Ventures and Floodgate.

At its core, Ayasdi’s product, a cloud-based service called the Insight Discovery Platform, is a mix of distributed computing, machine learning and user experience technologies. It processes data, discovers the correlations between data points, and then displays the results in a stunning visualization that’s essentially a map of the dataset and the connections between every point within it. In fact, Ayasdi is based on research into the field of topological data analysis, which Co-founder and President Gunnar Carlsson describes a quest to present data as intuitively as possible based solely on the similarity of (or distance between, in a topological sense) the data points.

It might sound complicated, but anyone familiar with the idea of social networks can probably understand how Ayasdi’s software works. Our social graphs are maps of relationships between ourselves and all of our connections, in which clusters of names (called “nodes” in computer science parlance) and connections (called “edges” in that same parlance) illustrate who’s connected to whom. Assuming someone is connected with people from different parts of his or her life, the graph will likely bear this out visually via, for example, distinct clusters of professional, personal and academic connections.

My LinkedIn social graph (via LinkedIn Labs). Generally speaking, blue is cloud computing, green is big data, dark orange is co-workers/other journalists, light orange is law school, and purple is a former employer.

My LinkedIn social graph (via LinkedIn Labs). Generally speaking, blue is cloud computing, green is big data, dark orange is co-workers/other journalists, light orange is law school, and purple is a former employer.

Of course, the algorithms that decipher these connections don’t know how everyone is connected, just that they are. When we see our social graphs, it’s up to us to figure out why things shape up the way they do and whether that’s significant.

In essence, Ayasdi does builds and visualizes social graphs for any type of data, although the datasets it works with are much more complicated than social data. In the demonstration I saw, for example, the software analyzed medical data from 272 cancer patients covering 25,000 different genetic markers. In a few seconds, the software’s hundreds of machine-learning algorithms analyzed all those variables and created a map showing its own distinct set of clusters.

When there are thousands of variables and many, many more possible combinations in play, Co-founder and CEO Gurjeet Singh explained, “There are so many questions to ask that you don’t have the time to ask them all.” He added, ”It doesn’t even make sense to think about where to start your analysis.”

Ayasdi’s maps look like this (although no two will ever look the same). If a researcher were looking at the map, he or she would see where it might be best to start looking for answers:

Ayasdi Product Image 2

In the case of cancer data, that red cluster might represent a group of cancer survivors. As researchers dig into this area further, they might find, for example, that none of these survivors underwent chemotherapy and all share a rare genetic that might make them particularly well suited to fight off cancer cells. And while Ayasdi is working primarily with medical and pharmaceutical researchers — and claims users have already uncovered new insights even from old datasets — the company is also working with organizations involved in detecting financial fraud and terrorist activity, and those trying to discover new oil fields.

In fact, readers of Wired might have gotten a sneak peek at Ayasdi last year in reporter Jeff Beckham’s post about a project by then-Ayasdi-intern Muthu Alagappan. At the MIT Sloan Sports Analytics Conference, he demonstrated how the software was able to detect statistical patterns among professional basketball players. The resulting map showed how professional basketball players’ skills actually break down into 13 rather-defined classifications — which Alagappan then labeled — beyond the five positions by which players are usually classified.

13basketball

As I explained in November while covering startup company BeyondCore — which uses different technology to do achieve a similar end — the beauty of  what Ayasdi does is that users can see the correlations without having to think up and ask the questions that would expose those connections. That, of course, has been the modus operandi for business-intelligence applications and other analytics software since the beginning of time: Users write queries or otherwise ask questions of datasets, and the the software only deals with the variables it has been asked to analyze.

If a relationship is too hidden for the naked eye to see or too complex for a researcher or analyst to really consider, the software likely won’t be asked to expose it. Software like Ayasdi and BeyondCore (the CEOs of both companies will be presenting at Structure: Data in March, by the way) aims to eliminate much of that guesswork, and focus their subject-matter expertise on actually figuring out why the data points are connected as they are.

“We don’t necessarily need to treat computers like dumb question-answering machines,” Singh said. “We can actually make them do a lot more work.”

And as cutting-edge as this might sound, the techies out there should be interested to know that Ayasdi and its underlying technology are hardly new. The company has been in stealth mode since 2008, and is the result of 12 years of research among DARPA, Stanford University (where Carlsson moonlights as a math professor) and the National Science Foundation. Still, for anyone who spends their days writing SQL queries, writing segmentation algorithms or manipulating BI applications, the idea of having software mitigate much of that discovery work might seem futuristic.

It might even seem too good to be true. When I asked Carlsson how he addresses this mindset among skeptics, he suggested they at least give the software a try before dismissing it. As a comparison, he pointed to how people are able to spot a basketball in a garage by seeing everything in one shot without having to send a beam in every direction. “In a sense,” he said, “you could view the human visual system as too good to be true.”


GigaOM