Want cheap high-performance computing at scale like Google? Look to GPUs

Some startups pitch themselves as ways to extend certain impressive things Google can do to everyone, from real-time website updates to machine learning. One of Google’s next-generation projects is without doubt the so-called Google Brain, to make search and other products behave more intuitively based on information Google already knows about users. Now one researcher behind Google Brain is showing how other companies can do their own deep-learning computations at a non-Google price with GPUs and Infiniband interconnects.

As Wired noted Monday, researchers including Stanford professor Andrew Ng, who has done work on the Google Brain project — along with Google Fellow Jeff Dean, who will speak at our Structure conference in San Francisco on Wednesday — have published a paper showing how it is possible to use GPUs just as often as CPUs for this sort of heavy-duty work, particularly for quickly training neural networks. Such a system could cost $ 20,000, compared with the approximately $ 1 million price tag for Google’s project, according to the Wired article.

In the research, the cluster deployed 16 servers, each of which contained two quad-core processors, four Nvidia GPUs and an Infiniband adapter. The servers were connected by an Infiniband switch. “This particular server configuration was chosen to balance the number of GPUs with CPUs, which we have found to be important for large-scale deep learning,” the researchers wrote in a paper. The system is more efficient than a previous iteration:

With our system we have shown that we can comfortably train networks with well over 11 billion parameters—more than 6.5 times as large as the one reported in (Dean et al., 2012) (the largest previous network), and using fewer than 2% as many machines.

The research looks like another endorsement of the power of GPUs, a score for Nvidia and other GPU makers such as AMD. Nvidia GPUs already have gained traction inside the fastest supercomputers in existence, and now a trickle-down effect to more traditional data centers looks a bit more plausible. (Then again, the record-breaking Tianhe-2 supercomputer in China relies not on GPUs but a blend of Intel Xeon E5-2600 chips and Xeon Phi co-processors, according to a report from Data Center Knowledge, so the GPU campaign has a way to go.)

Of course, while the research gives further credence to the argument that high-performance computing on premise is not unrealistic, it comes as GPU-based cloud offerings are still available. Amazon Web Services has long since figured out that companies want to do high-performance computing with GPUs, and AWS’ Cluster GPU Instances incorporating Nvidia GPUshave gotten some known commercial use. Peer1, SGI and others also play here.

Interestingly, Google itself has not released a GPU option for its Cloud Platform, even though it calls on GPUs extensively for its Google Brain work. The researchers’ innovations could help Google develop a sound business model for selling GPU time as a service.

The research could also speed up other artificial-intelligence projects involving GPUs. Take robotic bees, for example, which are in need of a common brain to make sense of all the data the bees collect. Recognizing patterns more quickly with fewer machines at a lower price could help smart bees multiply.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

  • CES 2012: a recap and analysis
  • Infrastructure Q4: Big data gets bigger and SaaS startups shine
  • Infrastructure Q1: IaaS Comes Down to Earth; Big Data Takes Flight


GigaOM