Hadoop jumps through hoops, becomes mainstream

One of the things I love most about the software industry is the way new technologies can materialize from unlikely places and get applied in unexpected ways. Hadoop is a great example of this. Conceived by the open source community, Google, Yahoo and others, this programming framework has emerged as a promising solution to the big data problem.

I expect Hadoop to become enterprise-ready within the next 18 months. Encouraged by the arrival of innovative Hadoop vendors, many Fortune 500 companies — including eBay, Bank of America and JP Morgan — are experimenting with Hadoop deployments. As a technologist and an investor in this sector (Norwest Venture Partners, where I am a general partner, is an investor in Hadapt), I believe these investigations are quickly evolving into serious roll-outs. The following five key factors will accelerate mainstream adoption, making 2012 and 2013 Hadoop’s breakout years.

1. SQL provides a “fast pass” to Hadoop

The first hurdle Hadoop must clear is the stigma of its origins. As a product of the open source community, Hadoop and its countless siblings are regarded by traditional IT shops with confusion, suspicion, or even abject terror. Whatever their potential, these revolutionary interlopers threaten huge investments in expensive applications and proprietary technologies.

An SQL interface can help bridge the gap between the future, current and legacy technologies. Organizations are already purchasing Hadoop tools that offer various levels of SQL compatibility. We expect Hadoop to acquire deeper and deeper SQL support — and Hive, an open source SQL interface for Hadoop, is a good start.

In the next 18 months, I think we will see large retailers, financial services, Wall Street and the government using this “fast pass” SQL option to initiate much broader Hadoop deployments.

2. Hadoop performance gets a big boost

One of the leading reasons to use Hadoop is its extreme scalability. To date, that scalability has often come with significant performance penalties, including MapReduce query overhead and a storage layer that requires broad scans across file systems. If big data can’t produce information on demand, then it’s just an albatross.

Fortunately, the entire Hadoop industry — including a rapidly proliferating group of startups (Cloudera, Hadapt, Hortonworks, MapR), the amazingly innovative open source community, and such established vendors as IBM — are aggressively tackling these performance issues. The forthcoming Hadoop v0.23 and subsequent releases will include performance-boosting enhancements, including basic file system performance, minimum MapReduce job latency, and higher-level query interface (e.g. Hive, Apache Pig) performance.

3. Hadoop becomes increasingly reliable

To avoid having a single point of failure, Hadoop needs to address topology and deployment concerns left over from its initial incarnation. Hadoop employs a master node to keep track of data and to determine how to access it. If this “brain” goes down, everything could be at risk without the correct topology and redundancy. Over time, the Hadoop community will make improvements in this area. Cloudera, Hortonworks, MapR and other commercial vendors are already addressing this.

4. Mainstream case studies emerge

Hadoop is a grassroots phenomenon that emerged in the social networking and consumer Internet world. As always, there are early adopters who take risks on the cutting edge, and there are more conservative organizations watching the pioneers from the sidelines.

This played out in 2011 as early customer experiences with Hadoop were shared via conferences, online forums and vendor white papers. Experts think Hadoop is on the edge of a tipping point, as some of the earliest Hadoop implementers move from experiments to adoption. As a result, people implementing Hadoop today are benefiting from the lessons learned by the early pioneers.

In 2012 and 2013, we will see a growing body of case studies and the emergence of best practices as Hadoop technology matures and gets deployed in traditional enterprise environments. In short, Hadoop’s momentum will grow exponentially in the next 18 months.

If becoming mainstream is step four in the technology adoption process, Hadoop will move through step two and into step three this year and next.

5. The architecture evolves

Hadoop applications process vast amounts of data in parallel across many computers, relying upon MapReduce as the enabling distribution framework. Currently, Hadoop tightly couples distributed resource management and a single distributed programming paradigm (MapReduce) into one package. The Hadoop community is now decoupling the two functions. Separating these will provide more control over the different system functions and free up query processing.

Future releases of Hadoop will have an enhanced MapReduce framework and will feature a growing array of alternative distributed computing paradigms. Likely candidates include Message Passing Interface (MPI), distributed shell systems, OpenDremel and Bulk Synchronous Parallel (BSP). With these additional programming and distribution options, Hadoop will be able to support an even greater variety of workloads.

Hadoop is here to stay

Over the next few years, Hadoop will become a common component of the standard IT tool belt. To meet this demand, vendors are starting to package Hadoop into commercial off-the-shelf software (COTS).

Hadoop adoption will build on itself as organizations augment Hadoop solutions and grow ecosystems around them. Before our very eyes, Hadoop is becoming a platform.

Matt Howard is a general partner at Norwest Venture Partners (NVP), where he invests in mobile and wireless, big data, security, rich media, networking and storage sectors. He currently serves on the boards of Avere Systems, Blue Jeans Network, ConteXtream, Hadapt, MobileIron, Retrevo and Summit Microelectronics. He blogs at NVP Blog.