Data scientists matter because data science is the future of IT

Data scientists are changing the way decisions happen by making better use of big data. Rather than finding ways around them, we need to make data science more accessible as a profession and need to provide easier tools for data scientists.

Kevin Kelly, in “Better Than Human,” tells us how the future is going to go down. As we increasingly automate existing occupations, we create new jobs in order to instruct and direct those robots. We build robots to take over the instructional positions, and create new jobs that set parameters and develop feedback loops. We build new systems that are flexible and dynamic and create more new jobs — such as data scientists — to analyze and build models for these new systems. It is obvious that in such a world, where static models cannot keep up, data scientists will be indispensable.

Given this future, the argument presented in a recent GigaOM post, “We don’t need more data scientists – just make big data easier to use,” misses some key points. Its premise — that we need simple ways to deliver big data to business decision makers — is correct in that we need better tools, yet it misses an important distinction about who will use these tools. To complete author Scott Brave’s analogy of web content systems, data scientists are the designers and the content creators of today, not the software engineers or the IT bottleneck.

Every organization will need someone wearing the data scientist hat just like very organization has people responsible for product, sales, marketing and support. Unfortunately, to date, the tools available to data scientists have been rudimentary. Data scientists have had to learn diverse and complex computer languages for working with data. That world is changing as we create simpler ways for data scientists to use big data.

Even recommendations are more than just recommendations

Retail recommendations are a basic example of why we need data scientists today. A retailer sees a return on investment by showing any suggestions to its customers, whether or not the recommendations are relevant. That has been true ever since magazines and gum first flanked the checkout line. Data scientists can do better, but not by creating more sophisticated mechanisms for recommending the same gum and magazines. Recommendations are a very crude use of data.

As veteran retailers know, the art of cross-selling and upselling is not as simple as showing a customer what other customers bought or a list of related items. Data scientists are now building models that anticipate our needs. They are finding new ways to delight us and define experiences that create natural guides toward ideal outcomes. Soon, we will not need recommendations; with well-defined models, the subsequent decisions and actions will be obvious and then they will be automated.

Consider a retail experience that goes beyond recommendations. The elders among us recall once having delightful interactions with a local shopkeeper. The teller was the owner and knew every item in the store and every customer in town. The owner was able to personalize in his head, matching each customer to precisely what they needed. We lost this experience when we started to optimize for price and diversity over experience.

Using data analysis, we no longer need to choose between efficiencies and experiences. Instead, we expand the definition of efficiency to include customer experience. In order to stay competitive, a retailer can no longer focus just on the efficiency of moving products to the right locations. Competitive retailers are building a deep understanding of how customers use their products. Efficiency means matching the right product to the right customer at the right time, even as both evolve.

And that’s the easy part

Retail is a tangible example because we are all consumers, but retail is not the most instructive example of why data science is becoming such an important field. The fields of health care, education and energy are all evolving at a rapid pace:

  • We are discovering new drugs and new techniques, and finding new ways to measure how to best care for patients.
  • The ubiquity of laptops and tablets, the emergence of online courses and the thirst for ongoing education are creating new opportunities to measure (not test) how students learn and refine how we teach.
  • Energy is on the cusp of a radical shift — from oil to gas, from fossil fuels to renewable energy, from manual heating, cooling, driving and flying to automated, fine-tuned and efficient use of the energy we produce.

We cannot begin to conceive of static models that will track and analyze these advances. We cannot replace data scientists with vertical defined, highly segmented solutions delivering slightly better analytics to existing decisions makers. Instead, we need to develop new creative thinkers and give them high-level tools to help them apply detailed data-driven models across a range of challenges.

Like storytellers, data scientists embody the heart and soul of an organization and find ways to make it better. Every organization is going to employ someone whose responsibility is to use data to drive automated decision systems. With time, the decisions these data scientists make will become obvious and can be automated. Today’s decision makers get to spend time on more important jobs we haven’t even thought of yet.

We need data scientists, and we need hundreds of thousands of them.  They will do their magic, create new ways of experiencing life, products and services and, as Kevin Kelly says, “dream up new work that matters.”

Omer joined WibiData in 2012 having worked for three years at Cloudera, bringing Hadoop to the enterprise market. He was previously at Vertica (now HP) where he led the Field Engineering team. Omer holds a B.Sc. in Computer Engineering from Tufts University and was a visiting scholar at Oxford University reading in Computation and Engineering, focusing on architectures for large scale distributed systems.

Feature image courtesy of Shutterstock user Minerva Studio.


GigaOM