Big Data and Warehousing

Relishing the Big Data Burger

How Hadoop wraps the data warehouse in a savory big data sandwich

Maybe it was all those years I spent at McDonald’s Corporation, but when I think about big data and warehousing, I can’t help thinking about a hamburger. No, not just because I’m hungry. I think the hamburger offers a good analogy of the big data and warehousing relationship as it exists today. Let me explain.

When Apache Hadoop first came into our world, there was this immediate, “Oh my gosh, the data warehouse is dead” reaction, and Hadoop was going to take all the data. Well, not so fast.

Those of us well-seasoned warehouse folks knew that this transition was not going to be a rapid change. Why? Because in the enterprise analytic architecture, the warehouse still remains the meat of the sandwich.

 

Facing the transactional data reality

Companies have invested lots of money, people, and time to build out these systems in which they run day-to-day reporting and analytics. They still needed a way to bring in all that transactional data that comes from running the business and report on it. How were sales? Which region did better than the others? This system was not changing anytime soon.

In addition, there was the pesky issue of how to access the data in Hadoop versus accessing it in the structured warehouse. I must admit, my favorite quote of the year was from Alistair Croll talking at the Strata Conference about how ironic it was that the future of NoSQL ended up being SQL.

Yet it was true. Once we started to increasingly use Hadoop, we ended up with the cold reality that we needed to create a more SQL-like interface to get at all that data. The reason was simply because SQL is what we know, it’s the skill we had in our shops, and we also wanted and needed analytic portability. Moreover, the shortage of Hadoop skills continues to plague many organizations, and there just isn’t enough of those skills to go around—even in the largest cities. Cutting over completely from the data warehouse to Hadoop seemed a bit further off than originally anticipated.

The reality is that the data warehouse is still the meat of the reporting and analytics that organizations do day in and day out. Hadoop hasn’t changed that reality quite yet. What has changed, though, is the realization that this great technology called Hadoop can now help businesses tap data sources they couldn’t use before. This technology, as it turns out, is a very nice complement to the burger, much like a nicely warmed pretzel roll is to a cheddar jalapeño pub burger—are you hungry yet?

 

Landing the data cost-effectively

Why is Hadoop the metaphorical bun for the big data burger? Well, as Hadoop moved further into production environments, two very prominent use cases emerged. We at IBM refer to the first use case as the landing zone. It is an area of the architecture where organizations are building out the capability to land all data—both structured and unstructured.

Let’s face it, Hadoop is a lot less expensive than the data warehouse, and by landing data there first organizations will no longer be limited to just structured data for analytics. Most likely, they will do some exploration in the landing zone as well, especially if most of the data they are leveraging is highly disposable.

The other prominent use case is leveraging Hadoop for archiving and offloading the data warehouse. One of the biggest challenges with big data is managing big costs. Yes, of course we want more data, and that makes data lifecycle management more important than ever before. Cold data should be moved to a cost-effective environment—not just to help manage costs, but to maintain performance of analytics against the hot data in the warehouse. Hadoop is a great solution for archival and warehouse offload.

Got the visual now? The data warehouse remains the meat of the analytic architecture, but it is nicely sandwiched and complemented by the landing and archival capability of Hadoop in the analytic architecture.

Now, that’s certainly not to say that Hadoop should be limited to just those two use cases. Indeed, the combination of these use cases can enhance existing analytics; but don’t forget that there is a whole new world to explore with Hadoop alone as well. Just make sure to start with the most important part—a business goal or question that needs answering.

And there we have the big data burger. Now about those fries…

If you have any thoughts or questions, please share them in the comments.
 

 
Previous post

Real Time Versus Customer Time

Next post

Putting Big Data Myths to Rest

Nancy Hensley

Nancy Hensley has been in the data warehousing and BI industry for over 19 years. Nancy worked in the early days of enterprise data warehousing, spatial warehousing and executive reporting as a customer in a Fortune 50 company and joined IBM in 1999. In 2004, Nancy lead the team that brought the first IBM data warehouse appliance to market. From her position leading the data warehouse architect team in the field, Nancy moved into the development organization focusing on data warehouse solutions and database technology. Today Nancy works in product marketing and strategy for IBM data warehouse solutions. You can follow Nancy on Twitter @nancykoppdw.