Maybe it was all those years I spent at McDonald’s Corporation, but when I think about big data and warehousing, I can’t help thinking about a hamburger. No, not just because I’m hungry. I think the hamburger offers a good analogy of the big data and warehousing relationship as it exists today. Let me explain.
When Apache Hadoop first came into our world, there was this immediate, “Oh my gosh, the data warehouse is dead” reaction, and Hadoop was going to take all the data. Well, not so fast.
Those of us well-seasoned warehouse folks knew that this transition was not going to be a rapid change. Why? Because in the enterprise analytic architecture, the warehouse still remains the meat of the sandwich.
Companies have invested lots of money, people, and time to build out these systems in which they run day-to-day reporting and analytics. They still needed a way to bring in all that transactional data that comes from running the business and report on it. How were sales? Which region did better than the others? This system was not changing anytime soon.
In addition, there was the pesky issue of how to access the data in Hadoop versus accessing it in the structured warehouse. I must admit, my favorite quote of the year was from Alistair Croll talking at the Strata Conference about how ironic it was that the future of NoSQL ended up being SQL.
Yet it was true. Once we started to increasingly use Hadoop, we ended up with the cold reality that we needed to create a more SQL-like interface to get at all that data. The reason was simply because SQL is what we know, it’s the skill we had in our shops, and we also wanted and needed analytic portability. Moreover, the shortage of Hadoop skills continues to plague many organizations, and there just isn’t enough of those skills to go around—even in the largest cities. Cutting over completely from the data warehouse to Hadoop seemed a bit further off than originally anticipated.
The reality is that the data warehouse is still the meat of the reporting and analytics that organizations do day in and day out. Hadoop hasn’t changed that reality quite yet. What has changed, though, is the realization that this great technology called Hadoop can now help businesses tap data sources they couldn’t use before. This technology, as it turns out, is a very nice complement to the burger, much like a nicely warmed pretzel roll is to a cheddar jalapeño pub burger—are you hungry yet?
Why is Hadoop the metaphorical bun for the big data burger? Well, as Hadoop moved further into production environments, two very prominent use cases emerged. We at IBM refer to the first use case as the landing zone. It is an area of the architecture where organizations are building out the capability to land all data—both structured and unstructured.
Let’s face it, Hadoop is a lot less expensive than the data warehouse, and by landing data there first organizations will no longer be limited to just structured data for analytics. Most likely, they will do some exploration in the landing zone as well, especially if most of the data they are leveraging is highly disposable.
The other prominent use case is leveraging Hadoop for archiving and offloading the data warehouse. One of the biggest challenges with big data is managing big costs. Yes, of course we want more data, and that makes data lifecycle management more important than ever before. Cold data should be moved to a cost-effective environment—not just to help manage costs, but to maintain performance of analytics against the hot data in the warehouse. Hadoop is a great solution for archival and warehouse offload.
Got the visual now? The data warehouse remains the meat of the analytic architecture, but it is nicely sandwiched and complemented by the landing and archival capability of Hadoop in the analytic architecture.
Now, that’s certainly not to say that Hadoop should be limited to just those two use cases. Indeed, the combination of these use cases can enhance existing analytics; but don’t forget that there is a whole new world to explore with Hadoop alone as well. Just make sure to start with the most important part—a business goal or question that needs answering.
And there we have the big data burger. Now about those fries…
If you have any thoughts or questions, please share them in the comments.
Forrester report: Extract business value from social content
IBM white paper: Could your content be working harder—smarter?
And take advantage of open source InfoSphere Streams components
Podcast: Build a business case for real-time analytics
White paper: Deploy Hadoop to gain insights from mainframe data
Big data in a minute: Lighten the big data load