What happens when a site goes viral and more than doubles it user base every month? It breaks of course. Here’s how popular photo-based social network Pinterest, handled the problem, and a few tips from Pinterest engineers on how others might avoid the same sort of trouble.
Marty Weiner and Yashh Nelapati of Pinterest shared the lessons from that experience on Thursday at the Surge Conference in Baltimore, Md., with most of the tips being about how the site has scaled its MySQL database. It’s a presentation the guys have given before, and slides can be found here. But for those who want the big picture in a few bullet points, here ya go.
Simply put, learned quickly that too much complexity was its enemy if it wanted its infrastructure to scale as fast as the site was growing. Pinterest began in March 2010 hosted on Rackspace using one MySQL database and one small web engine. Once it launched in January 2011, it had migrated to Amazon’s Ec2, a few more MySQL databases, a few Nginx web servers, MongoDB and TaskQueue. As it transitioned to its big-growth stage, it began running more and more tools, including Memcached, Redis and at least three other tools.
So the first lesson Weiner shared was not to do that: Instead of running a bunch of tools, simplify. The tools he decided to focus on shared the following characteristics: they were free; they had a large and happy user base; and they all had good or really good performance. Those tools were Amazon, Memcached, Redis and MySQL. Granted, getting them to scale properly wasn’t an engineering-free task, but at least everything was manageable when the work was done.
One of the tougher choices Pinterest had to make was a decision between clustering and sharding. Weiner described a continuum between the two, portraying clustering as automatic distribution of data through tools like Cassandra, HBase and Membase, and sharding as a manual act of deciding where to put data on a machine-by-machine basis. Given his choice — sharding — he’s clearly a fan of control for his database technologies.
He complained that while the automatic distribution of data across servers was cool and easy, it also came with a big point of failure. Because the cluster management software that ran on his databases and handled how the database scaled across multiple servers, bugs and errors in that code would be automatically replicated across the entire cluster.
The rest of his talk focused on how to shard a lot of data and keep growing. For those who are interested in a deep dive on that technology, check out a video of the same talk he gave in May. For most of those who are thinking about building new scalable apps, the key lessons are probably that when building a web site that you hope to scale, you should keep it simple, go for popular and well-liked tools that are free, and seriously consider the tradeoffs between control and ease of use.
And for those of you who just like pinning photos to Pinterest, you can sleep well with the knowledge that thanks to the way the site sharded its database, all of your pins likely reside on the same server right next to your user ID. And that’s a good thing, because it makes it much easier to then scale the service to more users, without everything breaking down.