Cloudera who? Intel announces its own Hadoop distribution

Intel on Tuesday said it was getting into the software business with its own Hadoop distribution. The move is a potential blow for startups such as Cloudera, Hortonworks and MapR that are offering their own distributions of Hadoop, but it’s also an admission by the chip vendor that the opportunity in big data isn’t only to be found in selling hardware.

In a conference held in San Francisco, VP and General Manager of Intel’s Datacenter Software Division Boyd Davis explained Intel’s history in Hadoop that stretches back to 2009 and stressed that Intel is going to share some aspects of its Hadoop distribution, but not all. Intel has a distribution of Hadoop it has released in China, but today it’s bringing it to the United States Intel’s version of the Hadoop distribution uses Hadoop 2.0 and YARN, which is a cutting-edge version of platform compared with what most Hadoop users have deployed thus far.

Why Intel wants to push its own version of Hadoop

intelhadoophistory

Boyd introduced partners such as and Cisco, which has tuned the Intel Hadoop distribution for its own servers. Intel also hosted a panel that included executives from SAP, Red Hat and Savvis to discuss the challenges of big data and the promise of Hadoop and big data.

Davis was up front about Intel’s rationale for releasing its own distribution, namely that it was worried about the fragmentation and possible uncertainty associated with current Hadoop distributions. That could be read as a dig against the many startups already offering Hadoop distributions, all of which are slightly different (of course, Intel’s will be slightly different, too). Like all of the existing players such as Cloudera and MapR, Intel will open source certain aspects of its distribution, but will also keep software to itself.

Inside the data center, it’s no longer just web servers that matter

For example, Davis stressed that Intel will not share its management and monitoring software, which could be highly valuable for enterprise customers. The Intel software could coordinate with Intel’s data center management software and make managing a variety of workloads easier. And hidden in that coordination might be one Intel’s aims in pushing its own version of Hadoop — the threat of ARM chips used in Hadoop clusters.

Dell, Calxeda and others are evaluating the use of lower-performance, lower-power chips in Hadoop clusters, a market Intel would hate to cede in the data center as data grows and analytics becomes more important. To that end, Intel has also optimized its Hadoop distribution for solid-state drives, something that other Hadoop companies haven’t done so far.

When asked about Atom and the use of lower-performance processors for Hadoop, Davis noted that while people are using lower-end processors for Hadoop , but that those uses tend to have slower networking. Davis says that when you combine high-end processors with 10 gigabit Ethernet and Hadoop, customers get the performance that they want.

intelhadoop

So while Intel may tout stability and consistency as the reason for it’s decision to become a major player in the software market for big data, it’s also driven by the changes in the data center that threaten the grip Intel has on the hardware inside the data center. The cloud and big data has changed the workloads and hardware requirements for the data center and Intel is playing the long game in trying to release software that can be tuned to its chips.

The Hadoop drama isn’t over yet

Intel isn’t the only big vendor touting its own homegrown version of Hadoop. On Monday, EMC’s Greenplum division announced an entirely revamped version of its Hadoop distribution that’s merged with it’s flagship analytic SQL database. These big companies have big existing businesses to protect and lots of resources to put into doing it. As my colleague Derrick Harris wrote on the EMC news:

Looking past his competitive boasting, though, it’s easy to see [Greenplum’s Scott] Yara’s greater point when you ask him what all this Hadoop talks means for the data warehouse business on which Greenplum was built. He points to the mainframe business that fell from its high perch decades ago but still drives billions a year in revenue. A single MPP database system is still faster on certain workloads than SQL on Hadoop, but that gap will close over time and “I do think the center of gravity will move toward HDFS,” he said.

Hadoop is a juggernaut when it comes to big data. Intel is a juggernaut when it comes to data center infrastructure. Its decision to enter into the open source software market is a big one for the chip company, for the Hadoop ecosystem and for the myriad startups playing in this space. It’s a topic we’ll explore more during our Structure Data conference in New York on March 20 and 21.