Data privacy, machine learning and the destruction of mysterious humanity

Recently, I wrote an article about Disney’s new RFID location- and transaction-tracking technology, the MagicBand. Perhaps more magical for Walt than it is for you, the band allows Disney to track their customers’ actions inside their parks (and possibly outside). Where you walk, what you eat, when you stop to borderline-abusively yell at your kids. All that magic gets tracked.

Well, weeks after my trip to Disney — the rides, the churros, the vomiting and the tears — I found myself still mulling over this data privacy trade-off. Why do we make this trade? What are our reasons? Ultimately, people are willing to trade their data with companies like Disney for a couple of reasons.

What’s the worst that could happen?

First, humans are bad at discerning the value of their data. Personal data just appears out of nowhere, exhaust out of life’s tailpipe, so why not trade it for something small? I’m personally willing to hand over my own GPS location full time just so I have access to Angry Birds and a smartphone flashlight app.

Our brains evolved to assess trade-offs best in the face of immediate, physical needs and threats. Should I run from that predator? Absolutely. Unfortunately, we still have these same brains. That’s why the camel crickets in my crawl space make me flip out, but giving my kids’ data to Disney World feels perfectly acceptable.

Second, most of us feel that giving our data over to a private corporation, like Disney or Facebook or Google, has limited scope. They can only touch us in certain places (e.g., their parks, their websites). And what’s the worst those parks and websites are going to do? Market crap to us.

Feels low risk. No big deal, because the power lies with me, the purchaser, to act. Right? Well, while they increase our happiness, these companies may be doing nothing short of destroying humanity as we know it.

Now, that’s an outrageous claim. But this is an opinion piece, so as Robert Redford put it so eloquently in Sneakers, “It’s my dime. I’ll ask the questions.”

John Foreman at MailChimp HQ. Source: Derrick Harris

John Foreman at MailChimp HQ. Source: Derrick Harris

Hacking our decisions using data

While advertisers have for a long time been making emotional appeals for our dollars, now they can bring our personal data to bear on the problem. We can think of advances in ad targeting as increases in image resolution.

In the beginning, advertisers had a single dry ad. They didn’t know or target you, the consumer, all that well. The picture they had of you as a consumer might as well have been a stick figure drawn in crayon.

Then came demographic targeting and focus grouping. All of a sudden, the stick figure got some detail. Maybe some hair got drawn on the stick figure, a briefcase, some nether-regions, a dog at the stick figure’s feet.

Then data aggregation and tracking came on the scene. The caricature started to gain actual real-life pixels and features — shopping cart data, IP geolocation, MAC address tracking, parsing user agent strings, social data, etc.

So where does this increasingly realistic picture of the consumer go from here? This data inevitably has gaps. And while many of those gaps will be filled by better and more varied sensors (mobile data, connected automobiles, Jawbone, Nest, etc.), there’s another tool for filling them in: machine learning.

Data left online and in the real world form anchor points in the photo of you, from which machine learning algorithms can project the rest of your image. And as machine learning models grow in accuracy and sophistication, particularly at companies with an incentive to target ads, so does the interpolated image of exactly who you are.

This is where Facebook and Google are investing huge amounts of dollars. Recruiting directly from the professor pool, these companies are grabbing up the top machine learning minds in the world, such as Facebook’s recent hire of Yann LeCun to lead a new AI lab.

So if the story of advertising in recent years has been one of disingenuous emotional appeals from the Dos Equis man, the story of the future of advertising will be one of laser-guided disingenuous arguments.

structuredata2014_300x200_editpost2Your posts online betray your burgeoning interest in home brews, your medical issues, your fears, your fascinations, your willingness to spend, your crusade against gluten, your insecurities. And if you can dash a faint line between a question and the data breadcrumbs you scatter willy-nilly, you’d better believe a model can fill that line in with Sharpie.

If an AI model can determine your emotional makeup (Facebook’s posts on love certainly betray this intent), then a company can select from a pool of possible ad copy to appeal to whatever version of yourself they like. They can target your worst self — the one who’s addicted to in-app payments in Candy Crush Saga. Or they can appeal to your aspirational best self, selling you that CrossFit membership at just the right moment.

In the hands of machine learning models, we become nothing more than a ball of probabilistic mechanisms to be manipulated with carefully designed inputs that lead to anticipated outputs.

The famous neurologist Viktor Frankl once said, “A human being is a deciding being.” But if our decisions can be hacked by model-assisted corporations, then we have to admit that perhaps we cease to be human as we’ve known it. Instead of being unique or special, we all become predictable and expected, nothing but products of previous measured actions.

The promise of better machine learning then is not to bring machines up to the level of humans but to bring humans down to the level of machines.

Data scientists and their monsters

Who’s to blame for this sad state of affairs? Arguably, the not-so-humble data scientist.

Data scientists are demi-Christs. They are half-human, themselves targets of their own and other organizations’ machine learning models. Their own freedom as “deciding beings” is eroded by their products. And yet they are also half-god, the creators of these faux-sentient models.

Data scientists, not unlike Dr. Frankenstein, create unholy life by surging electricity in the form of computation through the sloughed-off data skin of society. But similar to the monster in Mary Shelley’s novel, there will always be unintended consequences. Just as Frankenstein’s monster could not shake its criminal past, so these machine learning models for all their advances cannot shake the past data they are trained on.

Models learn a behavior, a tendency, a personality, a propensity from past data and then they predict that thing they’ve learned with cold accuracy. But in bringing past personal data into present predictions, these models are like echo chambers, reinforcing past truths in the present.

And these echo chambers can reinforce societal problems. This is a concern with Chicago’s crime hotspot targeting model. What happens when a model shits where it eats? Police focus in on a hot spot and generate more arrests there. Those hotspots become hotter. The neighborhood gets less desirable. Education and jobs suffer. Those hotspots become hotter. The model sends more police. And on and on the death spiral goes.

As machine learning on top of personal data is used to dissect us more, the odds of someone breaking out of marginalized society decrease. Models say they’re a credit risk, so they’re fed terrible loans. Poverty is reinforced. Data says they’ll like Skittles, so they’re advertised Skittles. Obesity is reinforced. And rainbows.

They’re predicted to like bottom-of-the-barrel culture, so they’re sold Ed Hardy. We all suffer.

These models become the mathematical equivalent of Javert in Les Miserables refusing to allow Jean Valjean’s redemption. They are data-laundered discrimination.

Let us eat and drink, for tomorrow we’re modeled

This past year Mark Zuckerberg attended one of the big AI conferences called the Neural Information Processing Systems (NIPS) conference. This is kind of like David Bowie stopping at your house to catch up on some Game of Thrones with you.

Why’d Zuck go to NIPS? To learn, to recruit, to cozy up to the machine learning community. Because Facebook is invested in the dismantling of its users piece by piece, using data and machine learning, to process humans into a segmentation-ready data slurry that’s more palatable to its customers, the advertisers.

I attend a lot of conferences on these topics. There’s an excitement in the air. Machine learning and other analytics techniques have been reinvigorated by the business applications of combining AI with distributed computing and large data sets. I like to imagine that at these conferences I’m feeling a smidgen of what it was like to attend one of the earliest World’s Fairs.

But it’s an open question what these technologies will become. Are we birthing our own psychic destruction? Maybe.

Or maybe, like the characters in Disney’s WALL-E, we’ll all end up too fat to get our MagicBands off, surrounded by crap we don’t want but were too well targeted to pass up.

This post is an excerpt of a full version that ran Saturday morning on John Foreman’s blog. Foreman is chief data scientist at MailChimp and author of the book Data Smart: Using Data Science to Transform Information into Insight.

Feature image courtesy of Shutterstock user Sebastian Kaulitzki.

Related research and analysis from Gigaom Research:
Subscriber content. Sign up for a free trial.

  • A near-term outlook for big data
  • Why we must apply big data analytics to human-generated data
  • Sponsored Research: How story-driven video is poised to take off


Gigaom