No longer the new technology on the block, big data continues to generate significant buzz. Technologies such as Hadoop and HBase are seeing rapid growth, analysts are experimenting with new techniques and approaches, and business leaders are adapting their business models to rely more on the power of big data. McKinsey calls big data the “next frontier” for business, with the potential to transform business in the same way the Internet did over the past 15 years.
To take advantage of that potential, business leaders need to know what steps to take in order to make maximum use of their data asset. Business success with big data is not just about choosing the right cloud technologies or hiring smart data scientists – it’s about creating a business-centric approach that connects a company’s data to its business strategy, enables continual improvement, and follows through to impact processes, margins, and customer satisfaction.
In my experience with big data I have developed seven steps that can help any business leader drive more success with their big data initiatives.
1. Create a strategy for your data
Your data needs a strong strategy, one that connects to and underpins your business strategy and also integrates with department-level accountabilities. Develop a plan for where you want to be at each milestone, define your future capabilities clearly, and describe how your data capabilities will be utilized. Then hold your users accountable to where and how they will use the new capabilities, and what business impact they will drive.
For example, if you can use big data to improve in-store sales, you need to not only work with store managers to define capabilities that connect to their strategy, but also ensure the managers are held accountable to using the data capabilities correctly and delivering the intended impact. Doing so connects both your and their goals to the overall business strategy, helps create more usable capabilities, and ensures any needed iterations will be done jointly. A well-conceived data strategy will give you the most bang for the buck on your data investment.
2. Design for agility
Big data systems are just that…big, which means they tend to be inflexible. A great BI system, by definition, will cause a business to change, which in turn will require the BI system to change. Thus, your systems need to adapt quickly to keep pace with your business. Twelve- or 18-month release cycles are appropriate for certain parts of your system, but 3- or 6-month cycles may be appropriate for others. Carefully analyze each component of your big data system, and design for the right amount of agility you need.
You may decide to build higher levels of automation into the layers of your stack that change slowly, while reserving configuration-based approaches for layers of your stack that need to change quickly. In general, the top layers of your stack (e.g., user interfaces and reporting tools) need to be more agile than the bottom layers of your stack (e.g., data collection and storage), but many exceptions to this rule exist. Only careful analysis and understanding of your current and future uses of data will enable you to make the right decisions on agility. Designing for agility will enable your big data investment to keep pace with, and even lead, your business.
3. Understand latencies
Latency is a challenge in traditional BI systems, and big data only amplifies the problem. Big data solutions tend to be architected first as batch systems, with lower latency capabilities being addressed afterwards. Don’t save latency for last – analyze your key use scenarios in terms of latencies, and connect them clearly to business drivers. Focus on delivering the right latency for each need, including the value being driven, and let those needs drive your design. Certain low latency needs may require bypassing your big data system temporarily, sharing directly between systems in order to deliver specific scenarios.
For example, if your customers tend to interact with system A and system B in parallel or in quick sequence, these two systems may need to share data directly. The data can then be written into the big data system in time to be used by other systems. Delivering data on a real-time or near-real-time basis can be very expensive; thus, it’s better to think in terms of “right-time” data targeted to each need.
Describe latency requirements in detail, and ensure the business justification is sound. Understanding latencies will enable you to deliver data exactly when it’s needed, while keeping costs under control.
4. Invest in data quality and metadata
Data quality in any system is a constant battle, and big data systems are no exception; however, big data systems require much more automation and advance planning. You should first ensure that data quality is not treated as a project or initiative, but as a foundational layer of your data stack that receives adequate resourcing and management attention. Second, build in multiple lines of defense – from data mastering (where, for example, you are creating customer accounts) to data collection (where you are recording all of that customer’s interactions with you) to metadata (where you are organizing and dimensionalizing the data to aid in future reporting and analysis). Third, automate both the processes that identify and elevate data quality issues, and the measurement and reporting of data quality progress. Empower your data quality team with tools that solve problems at high scale, such as diagnostic and workflow tools. Efficient data quality practices will enable your big data system to earn its place as a trusted input for key business processes.
5. Get good at prototyping
The data sizes in most big data systems are too large to work with all at once, so it’s typically wiser to build small-scale prototypes to iron out the wrinkles and ensure you are meeting customer requirements. If you are building complex data integrations, online algorithms, or user interfaces, prototyping allows you to learn at a smaller and less costly scale. What’s more, prototypes can be shared early with your user base, which generates valuable feedback as well as excitement.
Prototyping requires somewhat unique skills that you will need to build and refine over time. Prototypers need to be able to move quickly, figure out new designs and technologies, understand user scenarios, actively solicit feedback, and not be afraid to fail. They need to be creative in their approach to solving problems, while still rooted in sound data mechanics. Why is prototyping better than wireframes or feature lists? Since prototypes are “real,” your users will give you better feedback; at the same time you will also understand some of the challenges you will face as you build the full-scale version. Building a strong prototyping capability will help you increase innovation and speed, while reducing the cost of mistakes.
6. Get great at sampling
Sampling will save you a lot of time if you learn how to do it correctly. There are many use cases for which sampling is an effective alternative to using full census (100 percent) data. Certain needs such as creating personalized experiences for each customer, or calculating executive accountability metrics, are not appropriate for sampling. But for many other needs, sampling is a viable option.
For example, understanding product or feature performance, looking at patterns and trends over time, and filtering for unexpected anomalies can typically be done on sampled data. One approach is to collect 100 percent of the data, but do most of your analysis on samples, and then confirm important conclusions on the full data set. Once you establish process flows to pull sampled data into standard tools such as Excel and/or SQL, you will see analyst productivity increase substantially, which will save you time and money and increase the job satisfaction of your analysts.
To get great at sampling, you need to do three things: first, develop standard sampled data sets that help your analysts address large swaths of business questions, updating them regularly; second, make sure you have at least one highly qualified individual (i.e. a statistician) who can ensure the data is being sampled correctly and results are not misapplied; and third, educate decision makers on the benefits and limitations of sampling so they can get comfortable making decisions with sampled data. The effective use of sampling increases productivity while delivering equivalent business value.
7. Ask for regular feedback
Big data is a learning process, both in terms of managing the data and in driving business value from its contents. Your internal user base is a valuable source of feedback and integral to your learning and development process. Your prototyping program will be a source of feedback, but you should also survey your users and benchmark your progress over time. Areas such as usability, data quality, and data latency are all categories within which users will give you feedback. In addition, you should ask for ad hoc feedback from every level of your stakeholder organizations so they see your commitment to making their business better.
As your data asset’s reputation grows, your stakeholders will give you more and better feedback, which will allow you to develop integrated goals and roadmaps, and drive more business benefit as a result. Regular feedback ensures your big data system is tightly integrated into business decision making, so it can play a lead role in business improvement.
Following the above steps will help you build more effective big data capabilities, saving you time and money, and driving maximum ROI for your business. The big data frontier is here; breaking through it requires an understanding of which steps will help you drive the most impact.
Chad Richeson is the CEO of Society Consulting, a Seattle-based analytics and technology consulting firm that provides business-driven data strategies, solutions, and analytics for its clients. Before joining Society Consulting in 2011, Chad spent 12 years at Microsoft driving analytics and big data solutions for Bing, MSN, Mobile and AdCenter.
Image courtesy of Flickr user Susan NYC.
Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
- Defining Hadoop: the Players, Technologies and Challenges of 2011
- Putting Big Data to Work: Opportunities for Enterprises
- Amazon’s DynamoDB: rattling the cloud market