We’re witnessing the rise of the graph in big data

GraphLab, a popular open source project dedicated to graph analysis and machine learning, is trying to capitalize on the excitement around graphs by spinning off a commercial entity, GraphLab Inc. GraphLab creator — an University of Washington machine learning professor — Carlos Guestrin will lead the new Seattle-based company, which has raised $ 6.75 million from Madrona Venture Group and NEA.

Graph analysis is among the hottest techniques around for making sense out large datasets, primarily by determining how tightly different data points are related or how similar they are. The term “graph” came into the broader lexicon along with social networks, which built social graphs to assess the relationships among their millions of users, but the technique has much broader uses.

My LinkedIn social graph

My LinkedIn social graph

Guestrin said GraphLab’s algorithms are used in a lot of recommender systems, but he also cites fraud detection in banking networks and intrusion detection in computer networks. We’ve covered graphs as the analytical model of choice for everything from content recommendation to tracking lab work in genomics. Really, though — especially when combined with machine learning — graph analysis can be applied to anything where there’s too much data for a person to possibly analyze the relationships between every point.

One of Ayasdi's graph-like data maps

One of Ayasdi’s graph-like data maps

Google also famously uses a graph-processing system called Pregel as part of PageRank. Although a number of graph databases and other projects have popped up in the past few years, Guestrin said GraphLab is actually a contemporary of Pregel. He and some colleagues at Carnegie Mellon built a small system for their lab about five years ago, then released it into the open source world with few expectations that it would catch on. Now, he added, Pandora and WalmartLabs are among the project’s user base.

Among those other projects are graph databases such as Giraph (an open source, Hadoop-based Pregel clone developed at Facebook) and Neo4j (which also has a commercial arm, called Neo Technologies), as well as Twitter’s Cassovary and fellow University of Washington project Grappa. Guestrin said GraphLab can work with most of them, particularly if they’re not designed to do machine learning at scale like GraphLab is. Some efforts, he noted, are focused on simply storing data in graph form (e.g., databases) or in providing simple graph analysis.

As for when we’ll actually see the results of the effort to commercialize GraphLab, Guestrin said it will be a while. Right now, he’s focused on the next open source release of GraphLab in July. However, the company will begin engaging with commercial users over the next several months to determine what types of features they would expect in commercial graph-analysis software.

The bigger question to come out of all this graph activity, though, is how big a market we’ll ultimately see for graph-analysis or any other specific technique. As companies get more comfortable with big data from a technical standpoint, they’re getting more interested in the different types of analysis it allows for too. This is evidenced by the quest to make Hadoop support myriad processing frameworks aside from MapReduce.

We already have a handful of commercial graph products on the market — including an industrial grade one called YarcData from supercomputer maker Cray — but how many will there eventually be? And if graph analysis is all the rage right now, what comes next?

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

  • A near-term outlook for big data
  • 12 tech leaders’ resolutions for 2012
  • Connected world: the consumer technology revolution

    


GigaOM