Kaggle is now crowdsourcing big data creativity

In a move to expand its utility beyond simply finding better answers to known statistical problems, hot data-science startup Kaggle is now letting its stable of expert data scientists compete to tell companies how they can improve their businesses with machine learning. It’s part of a natural evolution of Kaggle from a plucky startup to an IT company with legs, but it’s actually more like a prequel to Kaggle’s flagship predictive modeling competitions than it is a sequel.

What Kaggle has come to realize, Co-Founder and Chief Scientist Jeremy Howard told me, is that very few people in the world know more about machine learning than do the roughly 40,000 individuals who compete in Kaggle’s competitions. At the same time, the big data movement is putting the relatively complex but rewarding practice of machine learning on the radar of more companies, and they want to know how to leverage it for their businesses.

But without the right knowledge, Howard said, companies new to machine learning will likely either under-reach and lose out on possible benefits or overreach and risk overwhelming themselves. Who else better to educate them than the world’s largest (and, presumably, best) online data scientist community?

The way Prospect works is rather simple: Customers submit a sample of the data they want to better analyze, and competitors play around with the data and propose ideas on how to best use it. When the competition deadline comes, a winner is selected by some combination of community voting and customer selection. Customers pay their fee to Kaggle and the reward to the winner, and are free to do what they want with the advice.

“The worst case scenario is you spend a reasonably small amount of money and decide this isn’t a product for you,” Howard said, although he suspects most companies will end up wanting to take the next step and actually develop and implement models.

Inaugural Prospect customer Practice Fusion, at least, plans to do just that. The electronic health records provider is offering a $ 500 prize for the best idea on how to use its data, and then will conduct a $ 10,000 competition to actually develop a model based on that idea.

Howard said this process is part of the “analytics value chain” that Kaggle hopes to create as its own business model matures. Companies can use the platform to figure out how to best use their data, then build and train their models, figure out how to implement them in production and at scale, and finally maintain and improve them. “You have to get everything right in order to actually leverage your data,” he said, and Kaggle wants to help at every step.

Kaggle thinks it can become the go-to source for anyone needing help with machine-learning because its approach forces the platform to prove its worth. Fairly or not, Howard characterizes traditional analytics consulting firms as “having quite a lot of snake oil salesmen” that often don’t “have anything to back up their pretty pictures and words.” On the contrary, he says Kaggle’s competition approach “is by far the best method in the world for actually building and training [predictive] models.”

He acknowledges, however, that although he thinks the approach will continue to be the best one as Kaggle grows out its value chain, it might not. But Kaggle is a data science startup, he said, so it’s not about to act solely on his intuition: “We have to test these hypotheses.”

Feature image courtesy of Shutterstock user VLADGRIN.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

  • A near-term outlook for big data
  • 12 tech leaders’ resolutions for 2012
  • Infrastructure Q4: Big data gets bigger and SaaS startups shine



GigaOM