Technologies

The Marriage of Hadoop and the Data Warehouse

A match made in heaven

Big data: everybody is talking about it. The buzz generated around this topic almost eclipses the buzz around traditional data warehousing. Some big data enthusiasts have even speculated that all enterprise data will be hosted by an Apache Hadoop–based system in the near future and that the enterprise data warehousing (EDW) will be dead.

Well, there is no doubt that traditional data warehouse architecture is evolving. I have been writing and blogging about that for over a year now—but dead? Hardly. In fact, while everyone is talking about how one technology or architecture may win out over the other, IBM is having a different conversation.

At IBM, we prefer to talk about the marriage of Hadoop and the data warehouse because together, they really make the perfect couple. Think about it—the opportunity of big data for a traditional data warehouse shop is to consume data that they could not consume using traditional warehousing architectures.

But why aren’t traditional data warehouses up to the task? Well, for several reasons. First, the data warehouse has been traditionally architected to use structured data from our business systems to analyze things about our business. This data is cleansed, modeled, distributed, governed, and maintained for historical analysis. The data we store in the data warehouse is predictable both in structure as well as ingest rates.

In contrast, big data is unpredictable. It comes in many structures and it’s just too much volume for the EDW, especially since we are most likely to sift through lots of data to find what we really need. Then we may just decide to discard it because in some cases, the shelf life of this data is significantly shorter. If we decide to keep all that data, we need cheaper solutions than the EDW to store the unstructured data for historical analysis (which is yet another argument for using Hadoop in addition to the warehouse).

Big data is an opportunity for many customers, and Hadoop now offers us the ability to consume new sources of data that make our analytics even smarter. But this new frontier is a complement to traditional data warehouse architectures, not a replacement. We are still going to supply traditional analysis to all of the business areas (finance, marketing, sales, customer service, and so on)—none of that analysis will be going away anytime soon. But let’s face it: we should be expanding our analytics menu to include new sources that offer additional insight and new tools that allow us to do things we couldn’t in the past, such as sentiment analysis.

I believe that big data was one of the key motivators in the evolution of the EDW architecture—but it wasn’t the only one. The continued growth of appliances, the high demand for time to value, the need for agility, and even simplicity in our solutions also played large roles.

Think about it: agility and simplicity? Those were not words we used very often as we built our enterprise data warehouses! However, the facts are pretty simple. Many large EDW projects were never able to achieve their full potential because they became too complex and therefore far less agile than the business had hoped for. It’s also a fact that companies that do use analytics to drive decisions are better performers. These companies show a 49 percent improvement in compound annual growth rate (CAGR), they do 20 times better on profit growth, and they show a 30 percent uptick in return on investment. No wonder most companies are in a hurry to implement.

 

The secret to building this harmonious relationship is to really understand the type of analytics you have today, as well as what you’ll need in the future. The picture we once drew of the EDW now looks more like a thriving ecosystem. We have gone from using an architecture that focuses on serving up enterprise data to using an architecture that serves up enterprise data and smarter analytics.

Think about all types of data with all types of analytics. Now that’s smarter analytics!

We have made great progress. Let’s keep it going.
 

 
Previous post

What We Learned from Data Marts and Consolidation

Next post

Prevent Data Cholesterol from Clogging Your Enterprise Applications

Nancy Hensley

Nancy Hensley has been in the data warehousing and business intelligence (BI) industry for over 20 years. Nancy worked in the early days of enterprise data warehousing, spatial warehousing, and executive reporting as a customer in a Fortune 50 company and joined IBM in 1999. In 2004, Nancy lead the team that brought the first IBM data warehouse appliance to market. From her position leading the data warehouse architect team in the field, Nancy moved into the development organization focusing on data warehouse solutions and database technology. Today Nancy works in product marketing and strategy for IBM data warehouse solutions. Follow Nancy on Twitter @nancykoppdw.

  • Ramki

    Well-the reall question still remains at large or not called out explicitly.

    “We are still going to supply traditional analysis to all of the business areas (finance, marketing, sales, customer service, and so on)—none of that analysis will be going away anytime soon”-This is really the bone of contention. Can hadoop do a better job than Appliances or Structured EDW?

    Please guide me to some evaluations on this front.

    Thanks