Big Data and Warehousing

Don’t Overhype Data Science Expectations

Why it’s important to keep outsized data science claims in perspective

Someone once told me that expectation management was the key to a happy life. Without getting into a debate about how valuable that advice is as life wisdom, I will say that it definitely proves useful for managing big data projects. As I have written here before, getting fixated on “bet the company” sorts of projects and outsized results is poor expectation management. From the discussions I’ve been involved in, probably the greatest need for expectation management in the big data space right now revolves around the “magic” of data science.

Of course, data science isn’t magic at all; rather, it’s hard work and lots of preparation. But you wouldn’t know that from much of the online chatter about the topic.

Those of you who know me are probably already guessing that I’m going to slip into debunking mode now—and you’re right. Consider the public assertion made recently by an executive at an IBM competitor. I’m paraphrasing, but he basically said that data scientists can often drive orders-of-magnitude increases in the efficiency of solutions through rapid iteration. I almost fell out of my chair when I read that. Orders of magnitude? That is definitely not honest expectation management.

Has a data scientist ever found a way to make orders-of-magnitude improvements somewhere? Sure, probably in some massively screwed-up environment. What about in an existing, well-run business? Perhaps occasionally. But in an organization that already has a seasoned data analytics or business intelligence team? I’d argue that it happens very rarely.

A quick check of the math bears this out. An increase of one order of magnitude is the same as multiplying a quantity by 10. Doing something 10 times better in a functioning business that is already running at scale is really hard. If you could easily convert 10 times more prospects to customers, you would be doing it already.

Take, for example, Google’s constant experimentation. They tinker with the goal of improving things by tenths of a percentage point (which, given the scale of their business, clearly makes a meaningful impact). They are not aiming for tenfold improvements. Setting the expectation that you’ll grow revenue by 10 times, produce widgets with one-tenth the number of defects, or achieve 10 times better retention when your company is already operating sufficiently is just not realistic.

Another good example of this came up in a conversation I had recently with the head of analytics for a major B2C firm. The folks at this firm are very smart, and they are at a point where they can identify an individual using bits of consumer information from multiple sources about 20 percent of the time. We’re about to start working with them using the next generation of IBM® entity resolution and big data technologies, and we want to help drive their positive identification rate to 22 percent or even 24 percent over the next 12 months. As with the Google example, this improvement would have a big impact on their business—but orders of magnitude? No way. I would never try to explain the potential of what we can do in terms of orders of magnitude. And you shouldn’t either.

Don’t fall victim to the hype. Instead, have the people who actually do the work volunteer the target improvements, and then sanity-check those measures against how your business actually runs. You will be far better off by targeting modest, incremental, and sustainable goals than setting unrealistic expectations.

What do you think? Let me know in the comments.

Previous post

Going Cloud with Your Big Data: A Structured Approach

Next post

Big Data, All Data, PureData, BLU Data

Tom Deutsch

Tom Deutsch (Twitter: @thomasdeutsch) serves as a Program Director in IBM’s Big Data Team. He played a formative role in the transition of Hadoop-based technology from IBM Research to IBM Software Group, and he continues to be involved with IBM Research Big Data activities and transition from Research to commercial products. Tom created the IBM BigInsights Hadoop based product, and then has spent several years helping customers with Apache Hadoop, BigInsights and Streams technologies identifying architecture fit, developing business strategies and managing early stage projects across more than 200 customer engagements. Tom has co-authored a Big Data book and multiple thought papers.

Prior to that, Tom worked in the Information Management in the CTO’s office. Tom worked with a team focused on emerging technology and helped customers adopt IBM’s innovative Enterprise Mashups and Cloud offerings. Tom came to IBM through the FileNet acquisition, where he had responsibility for FileNet’s flagship Content Management product and spearheaded FileNet product initiatives with other IBM software segments including the Lotus and InfoSphere segments.

With more than 20 years in the industry, and as a veteran of two startups, Deutsch is an expert on the technical, strategic and business information management issues facing the Enterprise today. Most of Tom’s work has been on emerging technologies and business challenges, and he brings a strong focus on the cross-functional work required to have early stage project succeed.

Deutsch earned a bachelor’s degree from Fordham University in New York and an MBA degree from the University of Maryland University College.