A few stats, rumors and stories on on Hadoop’s rapid growth

There’s little hard data on the size of the largely private Hadoop market yet, but you can get a clue from looking at what’s going on inside Silicon Valley. The money changing hands and the sizes of the largest players in the space alone are enough to paint a telling picture of a market that’s growing fast in uncharted territory. I’ve collected some of the insights I’ve gleaned over the past few months to try and add some perspective.

Everything, of course, is relative and we might never see a Hadoop vendor reach the size of a database company such as Oracle with more than 100,000 employees and tens of billions in annual revenue. After all, Hadoop is a new technology for most companies, so it’s not really moving in on an already lucrative market and stealing budgetary dollars from incumbents. Further — and possibly more importantly — the core Hadoop technology is free and open source, meaning there are lots of unpaid downloads so money comes from services, support and large enterprises willing to buy software licenses for value-added products.

Money

Here’s a chart showing how much money Hadoop-based companies have raised thus far (although the grand total will likely rise by at least $ 10 million next week). Keep in mind, Cloudera only launched in 2009 and Hortonworks launched in June 2011. And these aren’t companies that merely bury Hadoop under an application or can connect their technologies to it — these are companies either selling Hadoop or applications designed specifically for it.

In terms of revenue, one might look to a May 2012 report by research from IDC estimating the size of the Hadoop ecosystem to be around $ 77 million, growing to $ 813 million by 2016. Those are both impressive numbers, but they might actually be short-changing reality. For one, as I noted at the time, the authors attributed next to almost no revenue to Amazon Web Services’ Elastic MapReduce service, which is almost certainly generating at least a few million in revenue each year.

Speaking to me in June, Cloudera CEO Mike Olson also took issue with the number, claiming it didn’t even take Cloudera’s revenue into account — which seems entirely possible considering the business Cloudera is doing. I’ve heard from reliable sources that Cloudera is doing very well and is on track to do about $ 100 million in revenue this year, very possibly more. And as early as April 2011, Cloudera executives were touting that software license revenue had already surpassed services revenue (although it’s arguable whether that will, or even has to, remain the case).

More anecdotally, I’ve heard from several sources that Hortonworks has already declined at least one potentially appealing acquisition offer. That it wouldn’t sell isn’t surprising: sources say the company is valued at $ 225 million after its last round of funding and is looking to raise more money. And although it just released its first product in June, the company has impressive and potentially lucrative partnerships in place with Microsoft, Teradata, Rackspace, VMware and other large vendors.

MapR, the proprietary thorn in the sides of both Cloudera and Hortonworks, appears to be doing quite well, too. Vice President of Marketing Jack Norris told me in June that his company had higher license revenue than many would expect and predicted that deals with Amazon Web Services and Google Compute Engine would help the company become “the license revenue leader within the next quarter.”

Former Cloudera VP of Technology Solutions Omer Trajan, who just left to join HBase-centric startup WibiData, shared some insights with me from his days at Cloudera that seem to back up vendor confidence. He said most mature production clusters (excluding monster users such as Facebook) consist of about 200 nodes, and many double in size after the first year. That’s part of the reason Cloudera grew in size about 10x during the three years he was there.

“It has definitely been a rocket ship,” he said. “… You just strap in and hope you make it up.”

Interest is only picking up, too: “There are more people that have started big data projects in the past six months than have big big data projects running [in production],” Trajman said.

People

It’s probably not accurate to call companies such as Cloudera, Hortonworks and MapR startups anymore, and we might start to see signs of this shift in personnel moves. Here’s how big they are and expect to become:

  • Cloudera: More than 300 employees globally and growing, especially in the sales department.
  • Hortonworks: 145 employees as of late October and hiring a person per day, on average, through the end of 2012.
  • MapR: More than 125 employees, mostly in technical and engineering positions; starting to build sales team and looks to more than double headcount in 2013.

While Cloudera and Hortonworks, for example, are still young, nimble and agile enough to lure a fair amount of talent from now-officially large enterprises such as VMware, their employees who joined on early and really love the startup life might not stick around.

Trajman’s new home, WibiData, is a fine example of this. It was launched last year by former Cloudera employees Christophe Bisciglia (who actually co-founded Cloudera) and Aaron Kimball to help companies build behavioral-analysis applications on top of Hadoop.

(Maybe there’s a Cloudera mafia shaping up: WibiData’s officemates — MemCachier and Thanx — both count former Cloudera employees as key members or founders of their teams, as does HBase-centric startup Drawn to Scale.)

Trajman, who was one of the first couple dozen employees at Cloudera (and who previously joined Vertica at around the same stage in its growth) told me he likes the rush of getting in the the ground level of new technologies and helping companies do something really new. While he enjoyed establishing and implementing some the the core foundational use cases for Hadoop (e.g., ETL and data exploration) with Cloudera’s early customers, that’s still much of what Cloudera provides to customers because it’s so difficult to build higher-level and higher-value applications at the infrastructural level where Cloudera operates.

“For me, it was very personal in terms of the impact I wanted to have,” Trajman said. At WibiData, he can help users who have the infrastructure part resolved and now want to develop applications that make data analysis a core part of their businesses. Where there’s a focus on innovation, he said, that’s where the innovators go.

This isn’t a bad thing, it’s just a side effect of growth — and when employees stay and innovate in the Hadoop space, it just creates a bigger pie for everyone to share.

Feature image courtesy of Shutterstock user GuskovaNatalia.


GigaOM