The advent of the $ 1,000 genome is bound to revolutionize researchers’ understanding of human health, but ever-lower prices on DNA sequencing are only half the battle. Researchers also need to analyze the raw data that comes off sequencing machines, which can range between many gigabytes to terabytes and can cost well more than the sequencing itself. That’s why a collection of startups are trying to stake their claims as essential parts of the genomics ecosystem by ensuring that analysis doesn’t become the bottleneck that slows progress.
Domain expertise, statistics and HPC, unite!
The latest is Bina Technologies, which just emerged from stealth mode last week and is bringing an Apple-like business model to genomics. The company, which grew out of a research project at Stanford University, relies on its Bina Box appliance to make genome analysis faster than typically possible without the benefit of having a supercomputer and a research network on hand.
According to Bina CEO Narges Asadi, who co-founded the company while completing her Ph.D. at Stanford, the appliance came about as part of a mission to solve a disconnect among the stakeholders in cancer research. Improving the analysis of cancer data required input from medical researchers, statisticians and high-performance computing experts, “but people are not speaking even the same language,” she said. While they’re all headed in the same direction, their paths rarely converge to harness peak velocity.
Asadi and her team solved that problem by developing a system that merged the three areas into one. With Bina, researchers can develop analysis pipelines that are optimized at both the algorithmic and silicon levels to run optimally across a mix of CPUs, GPUs and FPGAs, all of which are present within the purpose-built box. Applications are getting what they need in order to perform their best, and Bina says results can be processed 10 to 100 times faster (hours instead of days) than running jobs on the Amazon Web Services cloud, which has proven very popular for genomics workloads thanks to its supercomputer-like performance.
That being said, a chart Bina uses to illustrate the performance difference compares the Bina Box to a single eight-core AWS instance rather than a cluster of those high-performance instances.
The Apple analogy
Asadi answers the inevitable question of whether research centers will want to special appliances instead of using the cloud or generic servers by pointing to Apple. That company’s devices and computers can be a little more expensive and more difficult to tinker with than alternatives, but they’re also designed specifically with Apple’s operating system and applications in mind. It’s an analogy other companies, such as cloud computing startup Nebula, also use to justify their appliance-based businesses.
Not that Bina dismisses the cloud. If Bina’s software is the Mac OS to the Bina Box’s iMac, the Bina Cloud is the company’s iCloud. Once the box processes the raw sequencing data and compresses it into a smaller volume (up to 1,000 times smaller), the data is shipped to the Bina Cloud where it’s stored and can be easily accessed and shared. Actually, Asadi said that’s where the most-innovative research likely will take place. While Bina’s appliance handles the necessary first steps of genome analysis (e.g., determining how it’s unique), it’s the resulting data sets that are accessible by doctors, specialists and others to really make sense of it all.
Presumably, Bina is referring to companies such as DNAnexus when it compares its solution to entirely cloud-based approaches. DNAnexus is another Silicon Valley startup trying to democratize genome analysis, relying on the processing power and centralized nature of the cloud to serve as a platform for analyzing and collaborating on DNA data. Another startup, St. Louis-based Appistry, has taken a somewhat different approach, building its own high-powered cloud service and developing its own algorithms specially designed for genome analysis.
In the end, it’s all about the data
Regardless of which approach a researcher takes to solving the problem of sequenced genome data (they all have unique benefits), the underlying trend driving innovation is the deluge of genome data itself. Asadi said the biggest difference now compared with past efforts to analyze health data is that we have so much available. There are 30,000 fully sequenced genomes available right now, and some predict there will be 10 million in five years.
That means researchers can study DNA at a much more-granular level the previously possible, Asadi said, and they can analyze findings across huge data sets to identify previously undetectable patterns. Especially with cancer, she said, each case is relatively unique, shaped by many conditions and factors. If we’re going to make significant progress on treating it, we’ll need to know exactly what’s going on in any given case and how similar cases have played out. The more data, the more accurate the diagnosis and treatment.
Feature image courtesy of Keith Edkins.
Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
- Infrastructure Q4: Big data gets bigger and SaaS startups shine
- Defining Hadoop: the Players, Technologies and Challenges of 2011
- A near-term outlook for big data