Some of the U.S. government’s most research-intensive agencies want your help to come up with better ways to analyze their expansive data sets. NASA, along with the National Science Foundation and the Department of Energy, launched a competition on TopCoder called the Big Data Challenge series. Essentially, it’s a competition to crowdsource a solution to the very big problem of fragmented and incompatible federal data.
The first contest in the series involves answering a question, albeit a difficult one. From the contest page:
“How can we make heterogeneous (dissimilar and incompatible) data sets homogeneous (uniformly accessible, compatible, able to be grouped and/or matched) so usable information can be extracted? How can information then be converted into real knowledge that can inform critical decisions and solve societal challenges?”
This a problem that’s magnified in government agencies, but that plagues companies of all types that try to get started with big data. While future big data strategies might mandate a particular data format or other standards, the past and, often, the present is a messy pile of stuff created by different divisions within different agencies or departments. The dream of creating spectacular algorithms, beautiful visualizations and uncovering hidden insights often only comes after untold man-hours spent cleaning and munging data (often with Hadoop) into formats that software systems can work with.
The registration deadline for the first contest is Oct. 13, and the submission deadline is Oct. 19. The 1st place prize is $ 1,000. Later contests will focus on more domain-specific fields such as energy, health care and earth science.
Although the possibility of influencing big data strategies within some of the country’s most advanced agencies might be novel, crowdsourcing solutions to these types of difficult problems is becoming rather common. TopCoder exists as a competitive platform for solving application development and design issues, and the Big Data Challenge is just the latest it’s hosting for NASA via the agency’s Center of Excellence for Collaborative Innovation and NASA Tournament Lab. There’s also general-purpose platform InnoCentive and wildly popular data science platform Kaggle.
Feature image courtesy of Shutterstock user Prokhorova Nadila.