I spent the morning listening to Mark Zuckerberg and company talk about Facebook’s next big product, Graph Search, which allows folks to search (for now) using people, places, photos and interests as search vectors. The reaction to the news ranges from boredom to muted to wild exultation.
However, the event left me with more question than answers. While Danny Sullivan’s overview answers many of those questions, I still wasn’t clear as to how it worked.
So, how does this Unicorn based Graph Search work? From what I understand, there are two parts to the search, both built in-house. The first part is natural language processing based, which is essentially driven by the users’ questions. The second part is the one that brings back the answers.
The second part is built on an internal retrieval tool called Unicorn that has been in use at Facebook for a while. Facebook has created an index of all objects on the social platform. On the Facebook Engineering blog, Lars Rasmussen (who along with Tom Stocky) was responsible for building Graph Search writes:
Using traditional information-retrieval systems to mix keyword and structured queries is fairly well understood. But we needed the system also to find answers more than a single connection away, such as “restaurants liked by my friends from India.” Here we were in luck: one of our three existing systems, Unicorn, was designed exactly with this in mind.
The search infra team decided on a two-stage approach: first build out Unicorn to manage all the existing search experiences on the site and then, build out Unicorn further to meet all the requirements of Graph Search. Today, we are far enough along now to launch Graph Search as a beta, but we’re still missing is the ability to index all of the posts and comments people have shared on Facebook–they make up by far the biggest dataset we have for Graph Search and Unicorn.
At a very rudimentary level, Rasmussen in a call later explained that say when we type a query say, “Restaurants liked by my friends from India in San Francisco,” using natural language processing an aggregator tool takes the query and parses it into keywords again which searches are conducted. So “Restaurants,” “San Francisco,” and “Friends from India” become queries whose results are fetched and further sorted to give us the final answer. It is not new, except the scale of data on which such queries are being run is possibly a new (and rising) high watermark.
Rasmussen acknowledged that there are a lot of unknowns for the company and it is still not clear what kind of computing resources are going to be required for Graph Search at scale. Facebook plans to learn from phased beta rollout and do compute-resource planning based on that, he said.
My take: This is going to be a big infrastructure challenge, considering Lars and his cohorts want to keep the latency to below two seconds. Rasmussen pointed out that the “ranking algorithm” for results is going to keep improving as more and more people use the search. With as many as a billion searches on Facebook every day, even few million queries are going to be enough to help fine tune this ranking algorithm.
As for rest of my questions — I guess I will have to wait for the company to share details “about this challenge on the engineering blog soon,” as Facebook said in its blog.