How Facebook made it possible to geo-tag everything

It all seems so easy: You log into Facebook, update your status, tell everyone where you are and — voila! — your Timeline is geospatial. You’re not only showing your friends what you did or what you were thinking, your timeline map also shows them where you were when you snapped that photo or felt that particular way. Only, while it’s just one extra step for you to add location, building that capability was a tad more complicated for Facebook.

In a blog post on the Facebook Engineering page today, Facebook’s Karan Mangla describes how it took some serious big-data and architectural handiwork to add broad geo-tagging capabilities. It was actually relatively easy to find locations for Facebook Places when geo-location was only possible with updates done via mobile devices broadcasting GPS signals, but letting users add such data to any updates — even to do so retroactively — meant opening up the range of possible places beyond a user’s current physical location.

Dealing with a larger data set for every update meant Facebook had to change its infrastructure in order to make each update as efficient as possible, lest it risk a slow user experience. Here’s how Mangla said it accomplished this at the database layer:

Depending on the use case, results can be biased toward places closer to the user or provide completely global search. Since the candidate set of places increases significantly in this new system, ranking each of these places to find those that are most relevant to the current user becomes very important. With each place in the index, we store a large set of features to improve ranking, including check-ins and likes received by the place and our best estimate for the opening/closing time for the place. Then we run models generated by machine learning to select the most relevant places based on these features.

With any complex platform such as Facebook, though, there’s always a catch. Mangla noted that while displaying chronological information only requires a server to grab the last few dozen pieces of a user’s activity, displaying geospatial information requires  grabbing all the activity associated with any given place on the user’s map. Facebook had a fix:

To manage this data load, we created infrastructure to farm out data fetching to multiple servers. On every page load, a single server fetches the IDs of all pieces of content that can be displayed for the current user. This server then breaks up this data into smaller chunks, and each chunk is sent in a request to another server to actually fetch the data and do privacy checks. The responses from these servers is then combined to create the timeline map display.

The rest of Mangla’s post provides some deeper insights into how Facebook went about predicting locations for previously uploaded photos, as well as its plan for launching APIs that will let third-party apps update users’ timeline maps or feed users relevant information based on where they are at any given time. It’s well worth reading.

The bigger picture, though, is the further confirmation of how much work goes into making a web platform as sticky as possible to ensure maximum user and developer engagement. That means not only adding new features, but also making them perform as well as possible. That’s why, as I’ve written before, anyone expecting to buy Facebook stock and get rich quick might be in for a surprise. It takes a lot of time and money to get this stuff right, and Facebook plans to spend whatever it takes to keep itself relevant over the long haul.

Of course, assuming that all of this location data will eventually make its way to advertisers who’ll then be better able to target users with relevant ads, Facebook should still bring in enough money to keep everyone happy.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

  • Dissecting the data: 5 issues for our digital future
  • Infrastructure Q3: OpenStack and flash step into the spotlight
  • Finding the Value in Social Media Data



GigaOM