How Unbound Concepts crunches readability, or how big data could help improve literacy

It’s easy enough to understand why Curious George books are first grade fare and why Camus is often saved for the 12th grade. But what about all the books in between?

For the most part, teachers and literacy experts are the ones charged with considering sentence complexity, word difficulty, themes and other characteristics to judge the readability of text. But by turning over much of that analysis to algorithms, startup Unbound Concepts believes it can not only assess more text with more granularity, it can individualize education for K-12 students and potentially even power digital content that adapts to readers, whether they’re in the K-12 classroom or beyond.

“The idea here is that Google has the biggest corpus of language in the world and they can do things like tell you the difference between pebbles and rocks and Little Rock and Arkansas and the Arc of the Covenant,” said Benjamin Bengfort, the company’s CTO. “We want to be able to do the same thing but instead of our corpus being the web, we want it to be formal literature, particularly K-12 books [and then] medical texts and legal texts.”

In the short term, the company said, it wants to use its corpus to be able to help teachers identify the best books for each of their students. And that’s an especially big opportunity now given the rise of the Common Core standards in most states and teachers’ increasing need to find content that meets those standards. But, over time, Unbound Concepts has even bigger ambitions for its database of leveled books: as an engine for adaptive digital reading platforms that provide content best matched to student mastery.

Quantitative content assessments are insufficient

Launched in 2011, Unbound Concepts is part of New York ed tech accelerator Socratic Labs’ inaugural class of startups. Katie Palencsar, the company’s co-founder and CEO, said the idea for the company grew out of her experiences in education, both as a classroom teacher and school facilitator.

Trying to find the best books for their students, she said she and other teachers would scramble to ask peers for their opinions, but matching content with the needs of each student can be time-consuming and cumbersome.

Teachers aren’t left entirely to their own devices; for example, the Lexile Framework for Reading, a widely used educational tool, helps teachers identify appropriate reading content. But it’s based on qualitative factors, like sentence length and syllables per word, instead of the content itself.

“It’s a difficulty in education,” Palencsar said. “If you’re thinking of readers and what their needs are, you need to consider the semantics and the meaning and the skills.”

Taking a Netflix approach to predicting reading levels

The company’s first product, an iOS app called Bookleveler, gives teachers a platform for crowd-sourcing leveling recommendations. From the app, they can scan book ISBNs to see how they ranked on an A-Z reading scale and submit their own reading levels to help instructors. But, the company said, the point of the app was also to amass training data that could lay the groundwork for a more sophisticated analysis engine that now powers an API for predicting book levels as well as an online recommendation tool for educators and publishers. So far, Bookleveler has been downloaded more than 3,500 times and has generated about 35,000 data points — so the startup has a ways to go before its corpus approaches anywhere near Google-sized. But that’s just since the app was launched in October and the company said traction is rising steadily.

Much like the Netflix Prize challenged engineers to use user ratings to create algorithms for predicting movie ratings, Bengfort said Unbound Concepts uses machine learning and natural language processing to create its Meridien API that predicts levels for K-12 books. As opposed to traditional quantitative analysis tools (like the Lexile framework) that tend to include less than a dozen factors, he said, Meridien considers 140 features. It analyzes content along quantitative dimensions (sentence depth and breadth, etc.) but also along other vectors, including themes and topics.

With Unbound Concepts’ tools, for example, a teacher could do a search for a book appropriate for remedial first graders, that features dogs, includes examples of alliteration, involves the theme of camaraderie and hits the right emotional notes. Down the road, Unbound Concepts believes it has a shot at using its database and algorithms to power adaptive learning platforms that assess students’ mastery as they read and serve up the most appropriate content. On that front, the company will have plenty of competition, from publishing companies like McGraw-Hill and Pearson to startups like Knewton. And, along with the quickly crowding field, it will have to prove to educators that it’s more than hype.

But for now, the company, which has has raised an undisclosed amount of funding and is currently raising more, is focused on the immediate problem of helping teachers get the best content to their students. It’s getting ready to pitch its technology to publishers, who could license the software to better label books and it could also license that technology directly to schools.

“The quantitative metrics aren’t going to cut it,” said Bengfort. “We need individualized levels… we want our product to be a research scientist sitting in front of a particular student saying this is a good book for you.”