Big Data and Warehousing

Why Big Data Doesn’t Require a Big Idea

Bigger isn’t better when choosing your first big data project

A few months ago, I wrote an article arguing in favor of controlled experimentation as a corporate strategy for learning about big data. This approach flies in the face of the common misconception that companies should only embrace mature technologies with clear ROI. This month, I’d like to examine another big data misconception: the myth that leveraging big data demands a big idea.

Sure, big ideas are fun. Some big ideas really do change the world, thankfully. But when you really dig into how big ideas are operationalized, it becomes clear that good old-fashioned hard work rules the day. I know this idea isn’t consistent with all the ill-informed hype—but unlike the hype, it happens to be true.

I was reminded of this recently by yet another LinkedIn exchange on a Forbes article by Bob Evans. In the article, Evans gushed about a piece that Constellation Research CEO Ray Wang published on a Harvard Business Review blog. The gist of the article is that there are only three buckets that big data opportunities fall into:

  1. Differentiation. Wang argues that “big data offers opportunities for many more service offerings that can reinvent the customer experience based on better relevance of the experience.”
  2. Brokering. In support of extreme personalization, Wang says “these new analysis and insight streams could be created and maintained by information brokers who could sort by age, location, interest, and other categories.”
  3. Network monetizers. This category covers finding new ways of using personalized information and delivery mechanisms. For example, “large wireless carriers can map traffic flows down to the cell tower. Using this data, carriers could work with display advertisers to optimize advertising rates for the most popular routes on football game days based on digital foot traffic.”

Wang’s three opportunities are fine, but they also feed into the hype that big ideas are the only place to start. This just isn’t true. In my experience, the pragmatic use cases are a much better place to start. I know it can be more interesting to focus on big ideas right out of the gate, but in most cases, the right opportunity is a modest and pragmatic one. Swinging for the fences first time up is simply NOT a best practice. In fact, it goes directly against the project methodology I created based on all of IBM’s years of big data project work.

Going after a big idea as your big data starting point may work for a venture-funded firm whose whole existence is based on a swing-for-the-fences new product. But for the vast majority of enterprises, it is simply bad methodology. I’d also strongly suggest that the Network Monetizer idea is about to come under serious pressure from privacy considerations (more on that later this year).

I’m not saying that Wang’s three opportunity buckets are conceptually incorrect, but he is skipping over dozens of near-term better places to start. Sometime business users just need to be able to run their reports faster—and there is nothing wrong with that. Perhaps you can make a case for differentiation as a place to start (provided your goal is to walk, not run, by simply understanding customer behavior rather than trying to comprehensively reinvent the customer experience and/or how the company functions).

But if reinventing the whole company with your first big data project is an iffy idea, where do you start? First, brush up on Fit for Purpose architectures. Then keep these guidelines in mind:

  • Boil a bathtub, not an ocean
  • Pick a proven path
  • Make sure your project can be done offline and is non-disruptive to existing systems
  • Ensure that there is low-hanging fruit for additional insights
  • Use a data set that is already stored, but under-instrumented or overly summarized
  • Choose a project where initial findings can be arrived at in 4 weeks or less once the data is ready
  • Make sure your initial use cases are accretive to next set of use cases
  • Leverage common technology for next set of use cases

More on these ideas will follow in future columns. In the meantime, I’ve recorded several webcasts that cover these topics in an interactive format.

So what do you think? Does this all make sense? Do you have different or better ideas to propose? Let me know in the comments.
 

 
Previous post

New Big Data Security Resources from IBM

Next post

Large-Scale Data Management in PureData/Netezza: Part 1

Tom Deutsch

Tom Deutsch (Twitter: @thomasdeutsch) is chief technology officer (CTO) for the IBM Industry Solutions Group, and focuses on data science as a service. Tom played a formative role in the transition of Apache Hadoop–based technology from IBM Research to the IBM Software Group, and he continues to be involved with IBM Research's big data activities and the transition from research to commercial products. In addition, he created the IBM® InfoSphere® BigInsights™ Hadoop–based software, and he has spent several years helping customers with Hadoop, InfoSphere BigInsights, and InfoSphere Streams technologies by identifying architecture fit, developing business strategies, and managing early stage projects across more than 200 engagements. Tom came to IBM through the FileNet acquisition, where he had responsibility for FileNet’s flagship content management product and spearheaded FileNet product initiatives with other IBM software segments, including the Lotus and InfoSphere segments. Tom has also worked in the Information Management in the CTO’s office and with a team focused on emerging technology. He helped customers adopt innovative IBM enterprise mash-ups and cloud-based offerings. With more than 20 years of experience in the industry, and as a veteran of two startups, Tom is an expert on the technical, strategic, and business information management issues facing the enterprise today. Most of his work has been on emerging technologies and business challenges, and he brings a strong focus on the cross-functional work required to have early stage projects succeed. Tom has coauthored a book on big data and multiple thought-leadership papers. He earned a bachelor’s degree from Fordham University in New York and an MBA degree from the University of Maryland University College.