Etsy unveils its infrastructure (and its Supermicro love)

Online marketplace Etsy has been on a mission of openness lately — last week, it gave an in-depth explanation of a few recent outages — and on Friday it shared the details of the hardware architecture that powers its popular business. Etsy isn’t Facebook in terms of scale or specialization, but it’s always interesting to see what’s under the covers of growing web companies.

And under the covers at Etsy is a lot of Supermicro gear. According to blog post author Laurie Denness, a single class of Supermicro 2U, 4-node chassis powers powers a high number of workloads including memcached and web serving. Like many large web companies, including eBay, Etsy tries to stick with a limited hardware stack that’s versatile enough to handle multiple workloads.

With that in mind, Denness wrote, “A general configuration for these would be 2x 8 core Intel E5620 CPUs (@ 2.40ghz), 12GB-96GB of RAM, and either a 600GB 7200pm hard disk or an Intel 160GB SSD.”

She also notes the lack of RAID in this sytem, which could have a negative effect on redundancy. However, because Etsy uses Chef, and another tool called Cobbler, to quickly rebuild failed nodes, Etsy doesn’t think wasting power on RAID is necessary: “In our view, why power two drives when our datacenter staff can replace the drive and rebuild the machine and have it back in production in under 20 minutes?”

Etsy beefs up this same general-purpose chassis for its search engine, replacing the 8-core processors with 16-core Sandy Bridge processors and adding a boatload of solid-state storage:

[This setup] gives us machines that can handle over 4 times the workload of the generic nodes above, whilst using the same density configuration and not that much more power. … The nodes have 96GB of RAM and a single 800GB SSD for the indexes. This follows the same pattern of not bothering with RAID; The SSD is perfectly fast enough on it’s own, and we have BitTorrent index distribution which means getting the indexes to the machine is super fast.

Etsy’s Hadoop nodes

Supermicro is the name of the game for Hadoop and backup at Etsy as well. Although the post doesn’t disclose how big Etsy’s Hadoop cluster is, Denness notes that the company crams 96 cores, 384GB RAM and 24TB per 2U of rack space. For backup, it’s a 4U Supermicro box packing 36 2TB hard drives that deliver “a blistering 1.2 gigabytes/second sequential write throughput and a total of 60TB of usable disk space across two RAID6 volumes.”

HP also gets a little love at Etsy, powering both its MySQL database and a handful of special jobs, such as its Hadoop NameNodes “that don’t need much horsepower, but we deem important enough to have RAID,” Denness wrote.

Etsy’s hardware choices aren’t earth-shattering news by any means, but this type of openness is critical as a great number of businesses make their home on the web. Speaking openly about this stuff (hint, hint, Twitter) helps give other companies ideas about how to improve their systems, while also providing the opportunity to learn from suggestions from others (a la profiting from code contribution in open source software). It also helps advance the idea of a crowdsourced model for performance benchmarking, where everyone can see setups and performance data from in-the-wild systems that haven’t been tuned to the nth degree by vendors.

In a webscale world where software reins supreme, a floating hardware tide floats all boats.

Feature image courtesy of Shutterstock user MilousSK.



GigaOM