What we’ll see in 2013 in data

Whether we know it or not, data — big, small or otherwise — is becoming a central component to the way we live our lives. Various technologies have already enabled our favorite web companies and application developers to mine user data at a scale never before possible, and industries from health care to energy are figuring out that these same techniques can turn their iron mountains of data into gold. However, the data revolution is just getting started.

Here are three reasons I think, or hope, that the world of data is going to get bigger, better and more personal in the next year. If you’re looking for more, check out my November 2012 post highlighting the top trends shaping the way use, analyze and consume big data.

Get ready for Hadoop as you’ve never seen it before

hadoopThe new year should bring good news and bad news if you’re sick of hearing about Hadoop. The bad news is that you’re going to hear more about it. The good news is that you’re probably going to hear about it in a whole new light.

All over Silicon Valley, startups are working on ways to turn Hadoop into something more useful than a batch-processing engine that runs on tens or hundreds of physical servers. Sometimes, it’s as simple as offering Hadoop as a cloud service. Amazon Web Services, Mortar Data, Infochimps and others already do this, and newer offerings such as VertiCloud — headed by former Yahoo CTO Raymie Stata — and Microsoft Windows Azure HDInsight are also coming down the pike.

Probably more important, though, are the rash of companies trying to make Hadoop more useful by turning it into a platform for something other than running MapReduce jobs. Applications are the key to the adoption of any technology platform, and there just aren’t too many compelling applications for Hadoop in its current state. We’ve covered a handful of the companies doing this work already — Continuuity, Platfora, Drawn to Scale — but they’re just the tip of the iceberg.

I have it on good authority, for example, that we’ll see more real-time/streaming Hadoop frameworks in the coming months. These probably won’t be the only MapReduce alternative we see pop up as projects such as Drill and Impala mature, and Apache Hadoop’s YARN feature catches on. And it’s hard to imagine Hadoop won’t be used more and more as the core of enterprise software applications targeting companies in major fields such as manufacturing, financial services, or oil and gas.

Yes, there are other technologies floating around that do certain things better, but Hadoop is still the biggest thing in big data. It’s not going anywhere.

The Google-Ray-Kurzweil singularity

ray-kurzweilRay Kurzweil’s decision to come on board as Google’s director of engineering is fascinating for so many reasons. Not the least of which is while Google-bashing has become high art in some circles, the company is nonetheless one of the biggest drivers of innovation — and useful innovation — on the planet. Self-driving cars could improve road safety dramatically, and having just returned from a week and a half in China with next to no knowledge of Mandarin, I’m even more excited about the possibilities of Google’s’ Project Glass.

However artfully disguised, artificial intelligence (informed by lots and lots of data) is of course the key to everything Google is trying to do. It wants to build systems smart enough to know where we’re going, how to get there and what we’re seeing once we’re there — and then able to predict where we’ll go or what we want to see next.

This vision seems like a perfect fit for Kurzweil, whose predictions about the singularity — a seamless fusion of man and machine — seem more accurate with each passing day. And not only does Kurzweil have decades of experience predicting just the types of things Google has been creating recently, he also has experience building things. Things like the Kurzweil synthesizer and speech-recognition software. That’s why, one can assume, his title is director of engineering rather than some made-up novelty title that just looks to capitalize on Kurzweil’s name.

If Google and Kurzweil can find a way to work symbiotically as employer and employee, who knows what they’ll be able to pull off. Maybe it will be an even crazier batch of ideas with which to dazzle the public, but it might also be some legitimate progress on Google’s current batch of ideas (including those hidden away inside Google X) that have promise today but need some old-school engineering know-how.

Data for the people

If you’re like me, you’ve downloaded your Fitbit data (see disclosure) only to find out it’s not too useful. Unless you’ve already tied data regarding how many steps you’ve walked and the calories you’ve consumed each day to a more meaningful data point that Fitbit doesn’t track, such as mood or stress, a spreadsheet full of numbers and a few charts don’t tell you much.

And it claims I'm obese to boot!

And it claims I’m obese to boot!

The same holds true for almost every other tool available to quantify or otherwise visualize the lives of everyday people. So what if I tweet more often on Tuesdays, have added fewer Facebook friends this year than last or tend to eat more calories on days I walk more? And what about all those days I forgot to log my activity (like every day now for household chores) or fell asleep without putting a device on my wrist? The chances are your data is either messy/incomplete, isn’t voluminous to derive statistically meaningful relationships and/or simply isn’t measuring the right stuff.

Even if you have some amateur data-analysis skills or are wise to the ways of Statwing and Tableau, there’s just not a lot to learn about yourself. If weight loss is the goal, for example, eating better and exercising more will always be the answer. In order for the quantified self movement to catch on beyond a group of data-obsessed body hackers and a handful of advanced calorie counters, people need better information from all their data.

What I’d like to see in 2013 is a combination of applications, data and devices that makes it easy for average consumers to learn about themselves in sow meaningful ways. By mashing up health data from devices, location data from a personal-assistant app and textual data from Twitter, for example, users might be able to correlate exercise levels, diet and hours spent commuting with how positive or negative their general tone is, or the subjects they choose to comment on.

I want to see data analyzed and displayed as information I can really use to improve my life, tied to metrics I might not think to (or be able to) connect on my own. Another collection of isolated numbers, pointless peer comparisons and obvious bits of advice just won’t cut it.


GigaOM