Big Data and Warehousing

If Hadoop Was Easy, Everyone Would Be Doing It

IBM PureData System for Hadoop aims to simplify big data for the enterprise

The advantages of enhancing an analytic ecosystem with Apache Hadoop capabilities to leverage unstructured data have never been clearer. Just a few years ago, this technology was not available to enterprises—all we had was good-old structured data in the data warehouse. Today, though, the picture is very different. Hadoop is presenting tremendous opportunities to evolve from traditional data warehousing to big data platforms where organizations can process all kinds of data.

At IBM, we’ve spent quite a bit of time looking at how customers are leveraging big data technologies. Five key use cases have emerged:

  1. Enriching an organization’s information base with big data exploration
  2. Improving customer interactions with a 360-degree view of the customer
  3. Preventing crime and fraud with enhanced information in real time
  4. Optimizing infrastructure and monetizing data with operational analysis
  5. Gaining IT efficiency and scale with data warehouse augmentation

These use cases come to us from hundreds of engagements and interactions with clients of all sizes, in virtually every industry. Whether you intend to enhance the existing ecosystem with new data sources, leverage machine data, or gain efficiencies by leveraging Hadoop as an active archive, there are clear advantages to this technology.

But how easy is it to actually realize these advantages?

Well, if leveraging Hadoop was truly easy, implementation rates would have been higher than what we’ve seen over the last few years. The opportunity is there. The technology is there. So why isn’t every enterprise using Hadoop?

Pundits say 2013 will be the year when we move past the hype of big data and finally move into implementation mode. And the evidence seems to support this idea; a recent study from EMA Research suggests the number of companies that are implementing Hadoop projects in production is starting to push past 50 percent.

As it turns out, leveraging big data is not quite as easy as we originally thought. Open-source Hadoop does offer a file system—but it lacks the integration across the entire system that is necessary to simplify the adoption and consumption of big data in the enterprise.

Let’s face it: coming from the world of data warehousing, we are used to nicely integrated, easy-to-manage systems. Hadoop takes us back to the integration challenges we already overcame in traditional warehousing. Over the years, we’ve simplified data warehouses and made them much more agile. Today, we can deploy solutions in hours to satisfy even the most complex analytic requirements. These solutions are easy to maintain, easy to integrate, and over the years have required fewer resources than nonintegrated systems, thanks to autonomics.

Open-source Hadoop cannot deliver the integration we are used to because it is not complete. That’s one of the reasons IBM introduced IBM® InfoSphere® BigInsights™ a few years ago. We understood then that our customers needed more value on top of the distribution so that it could be more easily consumed in an enterprise.

On April 3, IBM announced several new innovations that address that exact goal. We made several key improvements to InfoSphere BigInsights—including the introduction of BigSQL, which offers a SQL-like interface for Hadoop—that are intended to help make big data more consumable.

My favorite Twitter quote from the Strata conference was, “Isn’t it ironic how the future of NoSQL has become SQL?” Actually, it’s not ironic at all—it’s just reality. The truth is that IT departments don’t have the luxury of retraining everyone who has mastered SQL.

We have taken simplicity even further, too. On April 3, IBM also announced its intention to release an IBM PureData™ System that is designed to ease adoption of Hadoop in the enterprise. The new system, PureData System for Hadoop, will be the latest member of the IBM PureSystems™ family that helps reduce the cost of IT and enables organizations to use different types of data services. The system will leverage InfoSphere BigInsights and deliver Hadoop with the simplicity of an appliance.

PureData System for Hadoop is designed around three key tenets:

  1. Built-in expertise to help accelerate time-to-value for big data. PureData System for Hadoop will help accelerate the delivery of Hadoop for the enterprise in hours, instead of weeks. In addition, because the system leverages InfoSphere BigInsights, it can accelerate insight for the business with built-in visualization and analytic accelerators for social data, text analytics, and machine data.
  2. Simplified experience. PureData System for Hadoop offers a single management console for the whole system. This helps reduce the need for more staff and simplifies administration of the system.
  3. Integrated by design. PureData System for Hadoop delivers both robust security and high availability that goes beyond what open source solutions offer today. In addition, the system is the only Hadoop solution that offers integrated archiving, which effectively augments the data warehouse by leveraging Hadoop as an active archive for the analytic ecosystem.

PureData System for Hadoop will help enable the use cases being realized in the current market by simplifying the delivery of Hadoop. Check back for more details on these new solutions in the coming months.

In the meantime, if you have questions or feedback, please leave a note in the comments.
 

 
Previous post

eBook: Running at the Speed of Business

Next post

Assessing the Impact of Big Data

Nancy Hensley

Nancy Hensley has been in the data warehousing and BI industry for over 19 years. Nancy worked in the early days of enterprise data warehousing, spatial warehousing and executive reporting as a customer in a Fortune 50 company and joined IBM in 1999. In 2004, Nancy lead the team that brought the first IBM data warehouse appliance to market. From her position leading the data warehouse architect team in the field, Nancy moved into the development organization focusing on data warehouse solutions and database technology. Today Nancy works in product marketing and strategy for IBM data warehouse solutions. You can follow Nancy on Twitter @nancykoppdw.

  • http://twitter.com/edbergavera Eduardo Bergavera

    I hope IBM would soon be offering Trial version of the PureData System for Hadoop. All in the cloud.

    • http://twitter.com/nancykoppdw Nancy Kopp- Hensley

      That’s a great suggestion & stay tuned!! Meanwhile InfoSphere BigInsights is available in the cloud and you can download a free version to take for a test drive- check it out!