How one startup wants to inject Hadoop into your SQL

Forget about standalone Hadoop clusters running batch workloads to churn through massive datasets a few times per day or per week. San Francisco-based startup Drawn to Scale thinks Hadoop’s real home within the enterprise will be in your database. No, it’s not not trying to sell you some limited-use NoSQL data store — it wants to turn your good, old-fashioned SQL database into real-time analytic engine that can grow like a weed.

And it wants to make this transition as easy as possible for customers that don’t want to get their hands dirty working with Hadoop or trying to turn a relational database into something it isn’t.

Drawn to Scale’s product, called Spire, is built atop HBase (a distributed NoSQL database that uses the Hadoop Distributed File System), so it scales with ease. Because it uses SQL, users can write robust queries in a familiar manner. Because it uses a distributed index and knows exactly where to go to find the right data, it’s fast as heck. Because it leverages Chef to configure the database, there’s no need to learn the finer points deploying and managing a Hadoop cluster.

“We’re sort of the only game in town when it comes to cloud-scale databases that run on top of Hadoop,” says Drawn to Scale Founder and CEO Bradford Stephens. There are plenty of companies that have lots of data — petabyes of it, in some cases — and want to find new ways to analyze it, but they don’t necessarily want to learn how to deploy Hadoop or write MapReduce workloads.

As of Tuesday, Drawn to Scale has another carrot to offer prospective customers because Spire now ships with MapR’s M3 distribution pre-integrated as the product’s Hadoop platform. Although it’s the bane of open source devotees such as fellow Hadoop vendors Cloudera and Hortonworks because it uses a proprietary file system in place of HDFS, MapR is gaining quite a following because of the performance benefits its technology brings. In fact, Drawn to Scale is just the latest MapR partner after Amazon Web Services recently tagged it for inclusion in its Elastic MapReduce offering and Google did the same for its Compute Engine cloud.

Bradford Stephens

When you’re building a distributed database, you need a fast distributed file system, he added, and MapR is the fastest Hadoop file system around. Because it’s fully API-compliant with HDFS, though, M3 still works just fine with Spire’s HBase underpinnings. “Despite our small size,” Stephens said, “we are more than willing to pick a horse and really advocate loudly for it.”

Although Drawn to Scale is relatively young — it’s still in private beta and just raised a $ 925,000 first round in March — large companies are already buying into its message. Already, Stephens told me, it’s engaged in seven paid pilot programs with large credit card companies, telcos and other “traditional” types of enterprises. One of them is already trying to negotiate a deal for a 1,000-node production deployment, he added. The company’s skeleton crew of mostly engineers couldn’t keep up the support workload if it opened the floodgates on all the inbound interest, Stephens said.

That kind of demand isn’t surprising when you consider Hadoop’s promise as the platform for a new generation of data-driven platforms. Whether or not anyone actually wants to use MapReduce — the programming framework that made Hadoop famous — they do want to take advantage of its nearly limitless scale on cheap, commodity hardware. When companies such as Drawn to Scale bake Hadoop into an otherwise useful product as a technological component rather than as the product itself, customers should experience the best of both worlds.

Feature image courtesy of Shutterstock user Jakub Pavlinec.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

  • A near-term outlook for big data
  • Infrastructure Q1: Cloud and big data woo enterprises
  • Amazon’s DynamoDB: rattling the cloud market



GigaOM