With its streamlined approach to developing and delivering software applications, DevOps has dazzled the IT world.
Now, DataOps is poised to remake data analytics. It relies on similar cross-functional principles, delivering agile data-based projects through a mix of automation, orchestration, and closer collaboration between once-separate teams.
To gain a better understanding of DataOps — its goals and real-world applications — we spoke with Dr. Kirk Borne, principal data scientist and executive advisor at Booz Allen Hamilton.
Connected Futures: How do you explain DataOps to a CEO?
Kirk Borne: It’s the application of DevOps concepts to data analytics projects.
DevOps concepts are tightly associated with Agile, which involves using short narrative cycles to build something, test, and verify that it works. If not, you refine it incrementally so you don’t build the full battleship only to find that it doesn’t float.
With DataOps, let’s say you’re building a recommender engine. You don’t want to invest millions of dollars in it only to realize that the data is incorrect, so instead you build what’s called the minimum viable product. It’s a reasonable product with the least amount of effort and investment to make sure you’re on the right track.
What’s the primary goal? Cleaner data? More-relevant data?
Part of it is cleaner data — making sure that you actually collected the right data, put it in the right format, and did the right transformation. Data cleaning is the most time-consuming part of any project. If you’re spending 50 percent or more of the project time preparing data, you want to make sure you build the right system and are using the right data. You want to have the most recent and most-relevant data.
But DataOps is basically a time-to-market solution. You want to be as effective and efficient as possible, and that includes the data cleaning phase.
Give an example of a company applying DataOps successfully. Is this really more than just a buzzword?
Several years ago, I was consulting with a financial services company that managed assets for their customers. They were trying to predict when a customer was about to move their money to another financial institution. There was a lot of discussion about complex models that included things like whether the customer had a life change — lost a job, or had a baby.
We ended up doing a little test that was hugely successful: counting the number of times a person clicked into their account every month. Customers were likely to take their money out if they visited their online account more often than the average customer to check their balances and make comparisons. So if you see this signal in the data — an uptick in the number of web clicks on a particular account—then respond to the customer with soft customer care.
The idea turned out to be validated by this simple model — or minimal viable product. The analytics team got a million-dollar investment from the board of directors to test it, and after three months, they estimated that they had saved the business $1 billion [in account retention].
And it wouldn’t work unless the operations team could respond quickly to the signal. What kind of internal interactions or reorganization does this require?
It’s about making sure there’s tight communication, collaboration, and integration between the data team and the operations team.
For example, while the data team is busily collecting and cleaning data, they need to be in contact with the operational people who are going to use this data, to ensure the data is correct. You don’t want to clean a data set that’s the incorrect data set, or build some kind of analytics tool that isn’t what operations team needs.
But it’s also important not to have too much communication. It’s not necessary for every small change to have to go through a change review board. When you get hung up on that analysis paralysis, you never get anything done.