Data scientists today are glamorized, heralded as the missing pieces in organizations that help make sense of all the data. In reality, however, their jobs are anything but glamorous: These professionals can spend as much as 80 percent of their time tediously collecting, cleaning, and preparing data for analysis. (Greta Roberts of Talent Analytics noted this challenge in Why Pay for Expensive Data Scientists?)
That means your data science team only gets to spend 20 percent of their time — equivalent to just one day each workweek — actually providing insights.
Raise their analysis time to two days per week and you’ve doubled their productivity.
With some determination and cooperation, executives including CMOs and CIOs can unlock this additional value. The secret is to create policies and systems that keep data cleaner and more consistent. That does take some work, but your data scientists will reward you with additional insights and more accurate predictions.
Why Your Data Needs So Much Work
“Data cleanup is one of those things that all data scientists dread,” said Chris Doyle, Director of Pricing, Promotions, and Analytics at Verizon. “Different data can have different problems, which is why it’s so frustrating for some people because there’s no clear roadmap — you have to discover those issues yourself.”
There’s a lot to data cleanup: It involves removing duplicate data, formatting it properly, renaming data that may have been manually input, re-coding data that updates over time, aggregating it into groups based on changing segmentation definitions, and more, Doyle said.
“It takes a lot of time because you rarely know the cleanup methods that are necessary” for any particular dataset, he said.
For example, you might receive data from one of your vendors that traditionally uses dashes. Your data scientists develop a query that automatically works with that data, only to realize that in one instance, your vendor has sent you data that uses periods. It breaks the query and requires you to burn time developing and deploying a new one.
So why not just bypass the time-consuming data cleaning process by running analysis on original data? Doing so can cause big problems.
“For example, analysis may use revenue as either an input or dependent variable. But if you are observing revenue over time, you have to account for inflation,” Doyle said. “If I don’t adjust for inflation, it will make my revenue look artificially better during recent time periods. This could lead to incorrect insights and eventually to unprofitable business decisions.”
Executives and vendors can make key decisions to reduce the time data scientists spend on this process, leaving them with more time for data analysis. It begins with creating a culture of data governance. (Also read 5 Steps for Using Analytics to Transform Your Business.)
“Everyone is defining things differently, which is making it really hard for us to reconcile data sources and pull them together. Some apps are trying to find ways to work together, but right now it’s the wild, wild west,” Doyle said.
Marketing, Doyle noted, holds the keys to many relationships with vendors. The marketing group needs to push for quality standards in the data these vendors send. This effort includes updating contracts to include the delivery dates of different pieces of information, requirements for the data structure, and punitive measures if the vendor doesn’t meet those requirements.
Everyone is defining things differently, making it really hard for us to reconcile data sources and pull them together.IT has responsibilities, too, Doyle said. The CIO should assign or hire someone to act as the head of data governance. This person would be responsible for creating and enforcing rules to ensure clean, high-quality data generated inside the company.
Manual data input, for example, can include typos as well as inconsistencies. An employee inputting state data, for example, might abbreviate Massachusetts as MA, Ma., or Mass.. To eliminate this variability, the head of data governance might advise creating drop-down menus instead of blank fields for data input.
“More and more data sources are being generated every year, and we’re trying to connect all of these different data sources together and analyze the combined information,” Doyle said. Doing so with the greatest efficiency requires a team effort at the executive level.