Why Apple, eBay, and Walmart have some of the biggest data warehouses you’ve ever seen

In an age of Hadoop and a general analytics revolution, it’s easy to poke fun at legacy data warehouse vendors such as Teradata. Some people might even call it fun. After all, they sell expensive appliances and weren’t built from the ground up to handle the unstructured data that most people think of when they think of “big data.”

But whatever you think about Teradata’s approach to handling big data workloads, make no mistake about the company’s clout: It has been around for decades, and it’s still analyzing boatloads of data for some of the biggest names in business. I spent a day in February touring the Teradata Labs facility in San Diego, and although I heard all about the technology and the company’s vision for a Teradata-Hadoop-Aster analytics super-environment, the thing that stuck out most were the users. Walmart, eBay, Continental … Apple.

Here’s how they’re all using Teradata and at what scale (try not to faint when you think of the bill):

  • Apple: Apple is operating a multiple-petabyte Teradata system (that became apparent during its iCloud launch in 2011) and, I learned, was Teradata’s “fastest ever customer to a petabyte.” Apple uses the data warehouse to get a better understanding of its customers across product groups. Now every piece of identifiable information — and those iTunes interactiona generate a lot of data — goes into the system so the company knows who’s who and what they’re up to.
Rows of Teradata appliances.

Rows of Teradata appliances.

  • Walmart: The retail giant deployed Teradata’s first-ever terabyte-scale database in 1992, and it has grown, uh, a bit since then. Its operational system was at 2.5 petabytes as of 2008, and is certainly leaps and bounds bigger by now — likely well into the double digits when you consider it operates separate ones for Walmart and Sam’s Club as well as a backup system. The analytics efforts have essentially helped Walmart become a massive consignment shop. It tells suppliers, “You have three feet of shelf space. Optimize it.” And then it gives them any data they could possibly need to determine what’s selling, how fast and even whether they should redesign their packaging to fit more on the shelves.
  • eBay: eBay (e ebay) has two systems in place, and they’re both big. Its primary data warehouse is 9.2 petabyes; its “singularity system” that stores web clicks and other “big” data is more than 40 petabytes. It has a single table that’s 1 trillion rows. Yes, this is smaller than the 50 petabytes worth of Hadoop capacity eBay added last year, but Teradata is quick to point out that all of its systems support data into and out of Hadoop, so it’s not as if eBay is operating two entirely distinct data environments.

Of course, Teradata has lots of other petabyte-scale customers, with Verizon, AT&T and Bank of America among them. Here are a few more interesting use cases:

  • Harrah’s (now part of the Caesar’s Entertainment casino empire) understands how much money particular gamblers can afford to lose in a day before they won’t come back the next day.
  • Disney is rolling out new bracelet tickets equipped with GPS and NFC that track everything visitors do while inside Disney’s amusement parks. The New York Times detailed the privacy implications of this move in a January article.
  • A manufacturing customer generates 20 terabytes of data per hour while testing products, although that volume is ultimately reduced to about 1 terabyte after the valuable data is filtered out.
  • At some point, Continental Airlines decided it wanted to keep its customers happy and began assessing them by lifetime value (which, it turns out, is often inversely related to frequent-flyer status) and began making alternative arrangements for them as soon as the airline realized flights would be delayed.
  • A luxury car company used Aster Data to analyze the pattern of failures for various components inside its cars. It found out that lighting, seats and infotainment often failed together (they’re on the same circuit) and began inspecting all three when a customer comes in for service on any of them.

bmw

None of this means Teradata is destined to continue being a huge name in analytics (Scott Yara, co-founder of rival EMC Greenplum, recently called data warehouses this generation’s mainframe), but it’s still interesting to learn how big companies are analyzing their data, regardless what they’re running on. And with exabytes worth of data no doubt residing in customer systems across the world, Teradata isn’t going anywhere soon.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

  • The importance of putting the U and I in visualization
  • 2012: The Hadoop infrastructure market booms
  • A near-term outlook for big data


GigaOM