Big data: The quick and the dead

The IT hype machine has everyone jumping on the big data bandwagon. Consultants are making millions helping companies search for nuggets of insight in loosely related data and the hardware industry is enjoying a mini-boom as more data than ever is being saved and more storage and computing power is being sold to process it.

Why bigger doesn’t always means better

At the center of much of the discussion is Hadoop. This confusing collection of distributed computing technology may be open source, but it’s neither cheap nor friendly despite the cute elephant logo. In fact, Hadoop and big data seem like the dream ticket for the vendors of big storage and big iron, many of whom have made expensive acquisitions to get into this lucrative market.

But before we start saving every scrap of data in the enterprise for fear that we will miss a nugget of insight, shouldn’t we focus on what we already have? Surely, the real goal is to enable more people in the company to do more with their existing data before adding new data of undetermined relevance and quality? Perhaps it makes more sense to get off the big data bandwagon and focus on empowering the business user to use the data that they’ve already got more effectively — not feed the elephant and its ecosystem of hangers on.

Often, the big data discussion is framed by the implied premise that bigger is better and that adding more data will naturally produce insights. Should you buy into the hype? Big data projects come with big investments in complex computing systems and the specialized skills to make them go. Worse, they are burdened by notoriously long deployment schedules and poor performance.

You don’t need more dead data

Maybe some huge enterprises and government departments do need big data, but what about the rest of us? Can collecting more data help? Perhaps, but you must first answer: Am I getting useful, timely answers from the data I already have. Do I have disciplines in place to operationalize insights and measure their impact on the business? Sadly, if the answer is no, you are not alone. According to a recent study by Freeform Dynamics, only 15 percent of enterprises feel they fully exploit traditional database information for decision making.

It seems that most of the data already stored for analysis is going underutilized. To the point, Bill Inmon, father of the data warehouse, claims that 95 percent of data in a warehouse is “dormant.” Will adding terabytes or petabytes of unstructured data to your already underutilized data warehouse change this? Probably not. In fact, there’s a good chance that it will make dormant data, dead data.

What companies need is not dormant or dead data. They need data that helps them gain operational insights to make their existing business run better. They need data that empowers their business users to be more creative and productive. They need live “quick” data not dormant or dead data.  If this makes sense to you, how do you get there?

End big, but start small

First, take stock of what you already have: Not just data but also knowledge and skills. Select a project where you can demonstrate incremental improvement with existing resources. If you need to hire, think business analyst, not technologist, because a dollar spent answering a business question is an investment; one spent on specialized IT skills to support the process is sunk cost.

Second, consider more agile off-the-shelf tools that will allow you to think big, but start small and scale fast. Think friendlier tools, accessible to your existing staff.  This approach will deliver more business insight today and many such tools scale well unless tested against the most extreme big data problems. The solution should allow intuitive use by the business manager with extensibility to support more complex mining by an experienced analyst. Knowledge of the underlying data structure or processing platform should not be necessary.

The analytics engine should run on standard servers with no proprietary hardware or specialized configurations, database schemas or tuning required to achieve the required performance. And because loading data into the analytics database can become your most time consuming effort, connections to your data sources should be based on industry standards and designed to greatly simplify data load from multiple formats.

Finally, adopt an agile, iterative approach – don’t go big bang on big data. Successful analytics initiatives are based on an ongoing dialog with the data meaning that a set of questions is asked the answers to which set up the next round of discovery. With each cycle more is understood about what data present, what is relevant, what needs to be added and how much history is worthwhile to add. Rapid time-to-answer is the most critical factor in harvesting the value from your data—big or small.

Maybe big data analytics will someday become a must-have for every business, but don’t be persuaded that this is the case now just because management consultants and major vendors are throwing millions of dollars at “What are you missing by not using big data analytics?” messaging. In all likelihood, you’re not missing anything and your time and money are better spent putting your existing data in the hands of more business users and giving them the tools to do deeper and faster analytics now.

One thing that evolution shows us is that small, agile species tend to do better than large, specialized species. Maybe we should apply the same thinking to our data?

Fred Gallagher is general manager of Vectorwise at Actian Corporation.

Image courtesy of Shutterstock user lafoto.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

  • Infrastructure Q1: Cloud and big data woo enterprises
  • Deploying big data: 2012 strategies for IT departments
  • 2012: The Hadoop infrastructure market booms



GigaOM