Red Hat open sources its take on Hadoop storage

Red Hat is getting into the big data act more deeply, announcing on Wednesday that it’s open sourcing its Red Hat Storage Hadoop plugin as an alternative to the Hadoop Distributed File System. The plugin, which the company expects to release to the Apache Software Foundation some time this year, is based on the Gluster File System technology Red Hat acquired when it bought Gluster for $ 136 million in 2011. Red Hat has already incorporated Gluster’s technology into its Red Hat Storage Server product.

According to a press release announcing the news

“Red Hat Storage brings enterprise-class features to big data environments, such as Geo replication, High Availability, POSIX compliance, disaster recovery, and management, without compromising API compatibility and data locality. Customers now have a unified data and scale out storage software platform to accommodate files and objects deployed across physical, virtual, public and hybrid cloud resources.”

Because it’s fully distributed, Red Hat’s file system does away with the NameNode that keeps track of data in most Hadoop clusters and can be both a bottleneck and a single point of failure. (Although, the Hadoop community has mitigated some of these concerns with Apache Hadoop 2.0 (as has Facebook with its own engineering effort called AvatarNode).) Red Hat has also combined its storage and virtualization technologies so anyone using both can have virtual pool of storage and compute resources residing on the same physical infrastructure.

rht hadoop

Red Hat isn’t the only company or organization trying eliminate problems associated with HDFS or to improve its utility to large enterprises and/or webscale companies. Companies such EMC, NetApp and others are offering their own alternatives, and Quantcast actually built and open sourced its own version of HDFS called the Quantcast File System. As we’ll be discussing at Structure: Data next month, Hadoop’s future prospects will be determined by how applicable it is to the workloads companies want to run, and some of the HDFS alternatives might represent solid options for enterprise workloads until Apache Hadoop can catch up.

rht hadoop2 Of course, Red Hat being Red Hat, Wednesday’s news isn’t all about big data. The company makes it pretty clear that it expects its Hadoop efforts to be part of a broader cloud computing push, in which companies can run their applications in big data environments that span private and public resources, including OpenStack and the Amazon Web Services clouds.