When Microsoft and VMware latch onto a technology, you know it’s for real. VMware is now pushing a project called Spring Hadoop that lets developers use the popular Spring Java framework to write big data applications atop Apache Hadoop. It’s estimated Spring has a developer base of more than 5 million, and giving them the ability to write Hadoop applications could be a big deal for both Hadoop adoption and for Spring’s stickiness.
Hadoop, of course, is the Apache Software Foundation project for storing and processing large volumes of unstructured data, often referred to as “big data.” The project gained popularity within large web companies such as Yahoo and Facebook that had to find a way to deal with and analyze mountains of user data in the form of logfiles, photos, clickthroughs and other new formats. Hadoop was inspired by Google’s work on the MapReduce parallel-processing framework and its distributed Google File System
There is now a whole ecosystem of companies selling Hadoop-based products, from commercial distributions to application-specific analytics software, and many mainstream organizations are using it to analyze their own data. However, the Hadoop MapReduce framework is notoriously difficult to program, which has helped keep mainstream developers away from the valuable data stored within Hadoop.
According to VMware’s press release:
Key aspects of Spring Hadoop include:
- Support for configuration, creation, and execution of MapReduce, Streaming, Hive, Pig, and Cascading jobs via the Spring container
- Comprehensive HDFS data access support through JVM scripting languages (Groovy, JRuby, Jython, Rhino, etc.)
- Declarative configuration support for HBase
- Dedicated Spring Batch support for developing powerful workflow solutions incorporating HDFS operations and all types of Hadoop jobs
- Support for use with Spring Integration that provides easy access to a wide range of existing systems using an extensible event-driven pipes and filters architecture
- Powerful Hadoop configuration options and templating mechanism for client connections to Hadoop
- Declarative and programmatic support for Hadoop Tools, including FsShell and DistCp
Spring Hadoop is also open source, available under the Apache 2.0 license. More information on programming Hadoop jobs using Spring is available on the SpringSource blog.
This isn’t the first time VMware has given Spring developers access to new types of data stores. In 2010, VMware bought GemStone for its low-latency distributed data grid technology. That’s now part of the Spring Data project, which also features tie-ins to NoSQL databases MongoDB, Riak, Redis and Neo4j
The news of Spring Hadoop comes a day after Microsoft exposed an expanded Hadoop strategythat includes a new JavaScript framework for writing Hadoop jobs and an integration with its Excel spreadsheet application. Microsoft’s efforts will open up Hadoop to tens of millions of developers and business users familiar with Microsoft’s products.
Hadoop becoming accessible to more developers and, thus, more application types, will go along way toward making it the de facto big data platform for future applications, a development we’ll discuss in more detail at our Structure: Data conference March 21-22 in New York.
Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
- Infrastructure Overview, Q2 2010
- 12 tech leaders’ resolutions for 2012
- Amazon’s DynamoDB: rattling the cloud market