Is machine learning coming to a system near you?

If you like the idea of your analytics system getting more accurate with each piece of data it ingests, you’re in for an exciting run, because machine learning appears to be catching fire across the ecosystem of big data vendors. The timing isn’t surprising — as companies get comfortable with core big data frameworks such as Hadoop, they want to do more. They want to be something like Facebook or Google are now, not 5 or 10 years ago, and machine learning is a good start.

At its core, machine learning relies on algorithms that help analytics systems get smarter as they ingest more data. It’s not easy, but it’s very valuable in reducing the need for constant human intervention to analyze data and tweak algorithms accordingly. Companies rich in data scientists trying to predict outcomes such as market activity, customer behavior, computer problems or search queries have been using machine learning for years, and they were investing heavily in recruiting talented employees at least as far back as 2007.

However, until now, machine learning hasn’t been readily available to mainstream organizations not willing to shell out major bucks to specialists such as IBM or SAS. On Tuesday, though, big-data outlier (as in it pushes a non-Hadoop storage-and-processing framework) HPCC Systems released a beta version of its new open-source machine-learning algorithms. The goal is simple: let HPCC users move beyond the batch and transactional processing that its platform was built for, and let them utilize the parallel-processing engine for more-aggressive big data workloads.

HPCC Systems’ release is akin to Mahout, an Apache Software Foundation project that has been around for a couple years pushing the same agenda atop the Hadoop framework, but until now was the only attempt at building an open-source library of machine-learning algorithms.

But machine learning is becoming productized, too. On Saturday, I profiled five stealthy big data startups pushing past Hadoop, and machine-learning specialist Skytree was among them. Since then, I’ve been contacted by numerous other data startups, some in stealth mode and some not, all claiming to do machine learning to one degree or another. These companies want to take it a step further by letting customers benefit from machine learning simply by installing software and pointing their data at it.

All this activity suggests, to me, an exciting time to come in the big data space. With Hadoop or other platforms at the core, companies are getting jazzed about what’s possible and need tools to take their analytics to the next level. Mass adoption of machine learning might still be years out, but the drumbeat is starting now.

Image courtesy of Flickr user hackerfriendly.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

  • Dissecting the data: 5 issues for our digital future
  • Connected world: the consumer technology revolution
  • 12 tech leaders’ resolutions for 2012



GigaOM