Zynga open sources its server-performance monitor

Zynga has open sourced zPerfmon, its tool for monitoring the performance of its thousands of social-gaming servers. The company announced the decision in a blog post on Friday afternoon. The code is available on Github.

Web companies build and open source stuff all the time, of course, but what makes zPerfmon somewhat unique is its scale: It’s a single-server system that every day, according to blog post author Binu Phillip, processes 150 gigabytes of data, adds 100 million databases rows, and offers up “50 million profiles and 100s of ways to view them.”

“zPerfmon isn’t by any stretch of imagination ‘light and agile with subroutines connected like a string of pearls,” Phillip acknowledges. It’s a single machine processing myriad file types and job types, with dozens of ways to view and access data for each different game. “Processes exec others in a daisy chaining frenzy,” he adds. “… It is also remarkably stable and resilient.”

Phillips offers a fairly technical explanation of zPerfmon in his post, but here’s the gist of how it operates:

“The server is a processing engine with a heartbeat of 30 minutes. All data that is available at 30 minute boundaries are grouped, sliced and diced. In addition to profiles, server keeps user and instance counts and system metrics. All of this data is keyed with timestamps. The timestamps make it possible to dig down from increase in instance count to a spike in CPU to a page which had a missing break in a foreach() loop.”

There are numerous tools already around for monitoring and troubleshooting server performance, from commercial software such as Splunk to open source software such as Ganglia, but web companies often like to build their own stuff. This makes some sense considering the often-unique nature of each company’s system architecture and analysis needs. Facebook, for example, has built at least two monitoring systems of its own: One is a Hadoop-backed repository called Operations Data Store, while the other is a real-time tool called Claspin, which manages the company’s cache infrastructure.

Zynga uses zPerfmon to monitor its production servers, which run on the company’s custom-built zCloud architecture and which experience traffic unique to that setup and to Zynga users’ behavior.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

  • The importance of putting the U and I in visualization
  • AWS Storage Gateway jolts cloud-storage ecosystem
  • A near-term outlook for big data


GigaOM