Walmart Labs is building big data tools — and will then open source them

Walmart is making a change in the way it does online commerce and there’s a big data story behind that shift. Stephen O’Sullivan, senior director at of Global e-commerce, at Walmart Labs is prepping the retail giant to move from 10 different web sites to one and from a trial-sized 10-node Hadoop cluster to a 250-node Hadoop cluster. Along the way his team will build several tools to migrate data from the current Oracle, Neteeza, Oracle and Greenplum gear that he hopes to open source.

In the video below he explains the rationale behind the site consolidation and the subsequent data consolidation that will happen once the sites begin feeding shopper and transaction data into the Hadoop cluster. He still plans to use some of his existing data warehousing gear, but to a much lesser extent.

However, he’s hit a few snags when it comes to moving data easily from the legacy gear to his Hadoop cluster, and so his team is building the tools to make that possible. The plan is to open source those tools for the community, a big change in how Walmart has previously done business. But as O’Sullivan explains, the efficiency and quality of open source are to a point where it just makes sense, and he imagines Walmart will become more of a participant in the open source community overall.



Watch this video for free on GigaOM

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

  • A near-term outlook for big data
  • Dissecting the data: 5 issues for our digital future
  • What Amazon’s new Kindle line means for Apple, Netflix and online media



GigaOM