Technologies

Don’t Overhype Data Science Expectations

Why it’s important to keep outsized data science claims in perspective

Someone once told me that expectation management was the key to a happy life. Without getting into a debate about how valuable that advice is as life wisdom, I will say that it definitely proves useful for managing big data projects. As I have written here before, getting fixated on “bet the company” sorts of projects and outsized results is poor expectation management. From the discussions I’ve been involved in, probably the greatest need for expectation management in the big data space right now revolves around the “magic” of data science.

Of course, data science isn’t magic at all; rather, it’s hard work and lots of preparation. But you wouldn’t know that from much of the online chatter about the topic.

Those of you who know me are probably already guessing that I’m going to slip into debunking mode now—and you’re right. Consider the public assertion made recently by an executive at an IBM competitor. I’m paraphrasing, but he basically said that data scientists can often drive orders-of-magnitude increases in the efficiency of solutions through rapid iteration. I almost fell out of my chair when I read that. Orders of magnitude? That is definitely not honest expectation management.

Has a data scientist ever found a way to make orders-of-magnitude improvements somewhere? Sure, probably in some massively screwed-up environment. What about in an existing, well-run business? Perhaps occasionally. But in an organization that already has a seasoned data analytics or business intelligence team? I’d argue that it happens very rarely.

A quick check of the math bears this out. An increase of one order of magnitude is the same as multiplying a quantity by 10. Doing something 10 times better in a functioning business that is already running at scale is really hard. If you could easily convert 10 times more prospects to customers, you would be doing it already.

Take, for example, Google’s constant experimentation. They tinker with the goal of improving things by tenths of a percentage point (which, given the scale of their business, clearly makes a meaningful impact). They are not aiming for tenfold improvements. Setting the expectation that you’ll grow revenue by 10 times, produce widgets with one-tenth the number of defects, or achieve 10 times better retention when your company is already operating sufficiently is just not realistic.

Another good example of this came up in a conversation I had recently with the head of analytics for a major B2C firm. The folks at this firm are very smart, and they are at a point where they can identify an individual using bits of consumer information from multiple sources about 20 percent of the time. We’re about to start working with them using the next generation of IBM® entity resolution and big data technologies, and we want to help drive their positive identification rate to 22 percent or even 24 percent over the next 12 months. As with the Google example, this improvement would have a big impact on their business—but orders of magnitude? No way. I would never try to explain the potential of what we can do in terms of orders of magnitude. And you shouldn’t either.

Don’t fall victim to the hype. Instead, have the people who actually do the work volunteer the target improvements, and then sanity-check those measures against how your business actually runs. You will be far better off by targeting modest, incremental, and sustainable goals than setting unrealistic expectations.

What do you think? Let me know in the comments.
 

 
Previous post

Super Analytics, Super Easy

Next post

Take a Sneak Peek at the New DB2 11 for z/OS

Tom Deutsch

Tom Deutsch (Twitter: @thomasdeutsch) is chief technology officer (CTO) for the IBM Industry Solutions Group, and focuses on data science as a service. Tom played a formative role in the transition of Apache Hadoop–based technology from IBM Research to the IBM Software Group, and he continues to be involved with IBM Research's big data activities and the transition from research to commercial products. In addition, he created the IBM® InfoSphere® BigInsights™ Hadoop–based software, and he has spent several years helping customers with Hadoop, InfoSphere BigInsights, and InfoSphere Streams technologies by identifying architecture fit, developing business strategies, and managing early stage projects across more than 200 engagements. Tom came to IBM through the FileNet acquisition, where he had responsibility for FileNet’s flagship content management product and spearheaded FileNet product initiatives with other IBM software segments, including the Lotus and InfoSphere segments. Tom has also worked in the Information Management in the CTO’s office and with a team focused on emerging technology. He helped customers adopt innovative IBM enterprise mash-ups and cloud-based offerings. With more than 20 years of experience in the industry, and as a veteran of two startups, Tom is an expert on the technical, strategic, and business information management issues facing the enterprise today. Most of his work has been on emerging technologies and business challenges, and he brings a strong focus on the cross-functional work required to have early stage projects succeed. Tom has coauthored a book on big data and multiple thought-leadership papers. He earned a bachelor’s degree from Fordham University in New York and an MBA degree from the University of Maryland University College.