As organizations strive to analyze more data than ever and to do it faster than ever, the results they’re getting might actually be worse than those in the pre-big-data and real-time world — at least temporarily. During a panel discussion of “master data wranglers,” a major topic of conversation was the trade-off between analyzing lots of data fast and taking adequate time to drive real, meaningful results.
According to Abe Taha of Karmasphere, this problem exists in large part because the cost to store and analyze data has become so much cheaper, meaning organizations can actually take advantage of what they’re producing. He points to Hadoop, especially, as having democratized the capability of doing big data. Or, as Metamarkets Co-Founder Michael Driscoll puts it, organizations are suffering from the “attack of the exponentials,” which means that storage and bandwidth have gotten exponentially cheaper, making it feasible to tackle all this new analysis. Ideally, he said, the benefits of analyzing it outweigh the cost of analyzing it. With all that data now on hand and analyzable, said Glenn McDonald of ITA Software, “the expectatuion is that you’ll use it.”
But herein lies the problem, says Theo Schlossnagle of OmniTI. We’re listening to more things, he explained, but we’re not listening any smarter. In fact, he thinks the signal-to-noise ratio is very high, often resulting in worse decisions. However, he added, this might just be a matter of growing pains as organizations learn how to do big data optimally. Until that point, he said, the question is whether timeliness outweighs correctness.
Driscoll calls this situation “analysis paralysis,” citing the example of the CIA suffering through a decade of weakened analytics efforts before finally figuring it out. Very likely, he said, it could get worse before it gets better.
McDonald sees a value in the push to real-time analysis, though, even immediately: it’s much easier to figure out the right questions to ask. If it takes 14 hours to rerun an analysis because some factor was weighted incorrectly or something else went wrong, it’s very difficult to learn from your mistakes. If you can “get in” the data, explore and figure out what’s going on, it’s a lot easier to refine your algorithms, he said. Users must be able to interact with the data.
One thing seemingly everyone agreed on, however, is that we will figure it out, thanks in large part to the same trend that enabled big data: cheap infrastructure. Through options like cloud computing (Driscoll’s company stores about half a petabyte in S3), organizations can afford to take chances they might not otherwise take if it meant spending large amounts on server and storage infrastructure.
Related content from GigaOM Pro (subscription req’d):
- Will Hadoop Vendors Profit from Banks’ Big Data Woes?
- Infrastructure Overview, Q2 2010
- The Incredible, Growing, Commercial Hadoop Market