How Ancestry.com transforms mounds of data into legible digital records

Sure, genealogy nerds might have fun poking through U.S. Census records, birth certificates and other documents in pursuit of information about their relatives on Ancestry.com. When it comes to showing off individual records to friends and relatives, though, the presentation can lack punch, and telling the whole story of an ancestor’s life isn’t straightforward.

The people behind the Ancestry.com service have realized this. Now they’re making the most of their 4PB storehouse of official personal records, user-submitted information and other data with a new feature delivering sleek computer-generated but customizable summaries of information available on users’ ancestors.

Ancestry started rolling out the feature, known as Story View, earlier this quarter to a tiny share of its customers, and now it’s active for 10 percent of them. The plan is to analyze the use of Ancestry with and without Story View and round out the feature before making it generally available, probably later this year, said Eric Shoup, the company’s executive vice president of product, in a recent interview. Already Ancestry has made the feature more interactive by letting users move around a single page the images of documents and edit the associated bodies of text derived from the documents.

How it works

Story View builds on top of Ancestry’s already highly evolved tools for mining data about relatives, including some handwritten records. But sometimes only critical fields, such as name and place of residence, have been processed for inclusion in Story View. A customer can access a handwritten record, scroll down to the row in which a relative is described and toggle across columns to see data that hasn’t been processed, such as that person’s occupation.

Ancestry is working on getting more out of its handwritten records by gradually directing armies of “keyers” to decrypt handwriting and turn it into searchable text. Street addresses have been added in this way, and other fields will come later. And since Ancestry continues to add records to its repository, life stories will gain more sources to draw from as well.

The Story View life summary for one of Shoup's relatives

The Story View life summary for one of Shoup’s relatives

To generate one-paragraph summaries drawing on information from multiple documents (check out the top paragraph in the picture above), Ancestry looked to Narrative Science, a company founded in 2010 to make machines turn out readable copy. Early use cases came in the production of coverage of sports events and public companies’ earnings reports, but now Narrative Science technology is handling much more personal information.

When Ancestry first got involved with Narrative Science, it was only possible to produce data in big batches, said Reed McGrew, lead developer on Ancestry’s narrative and context services team. “They’ll produce huge numbers of financial reports, and that’s not really the experience we’re trying to deliver,” McGrew said. “Because it was meant for batches, it was pretty slow.”

Within a few months, Narrative Science came out with a new API that could work on a more granular level. “On kind of a user-by-user basis, they generate our life stories,” McGrew said.

Ancestry knows a thing or two about serving up genealogy information. The company’s editors provided editorial standards, or “rules,” for how the data should inform the narratives and how the narratives should sound, McGrew said. One Ancestry standard? “We don’t talk about births that happen to mothers less than 10 years old,” he said. “They’re more likely keystroke errors. They do happen in reality sometimes, for sure, but more often than not, when we find them, they’re errors.”

One of several records containing information on a relative of Shoup's

One of several records containing information on a relative of Shoup’s

Underneath a picture and life summary of an ancestor in Story View are zoomed-out pictures of documents, instead of discrete fields of structured text. Next to the images, Ancestry can plug in blurbs generated from information in the document. Those draw from a system that engineers drew up in house. Once Ancestry has found all the records associated with a person, it selects specific facts to pull out of them based on Ancestry editors’ rules, and assembles them into full sentences. Once the document-based blurbs are displayed in the browser, customers can edit and save them before sharing.

Sharing ain’t easy

The challenge is not the creation and storage of new data and websites that users create, said Scott Sorenson, Ancestry’s chief information officer. Storage has gotten cheaper and cheaper, and that trend should continue. Accurate processing of handwritten records generally is not an issue, either. Often the keyers are in China, Sorenson said. “The Chinese character set is much larger than our alphabet,” he said. “They’re actually very skilled at keying these records.”

The real hard part is to make sure the service is highly available, to serve up all the right document and text for millions of users and keep the site from crashing when traffic peaks. But since one goal of Story View is to get more people checking out content on the site and eventually signing up, that would be a good problem to have.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

  • Sector RoadMap: Social customer service in 2013
  • Continuous delivery and the world of devops
  • Listening platforms: finding the value in social media data


GigaOM