The advantages of enhancing an analytic ecosystem with Apache Hadoop capabilities to leverage unstructured data have never been clearer. Just a few years ago, this technology was not available to enterprises—all we had was good-old structured data in the data warehouse. Today, though, the picture is very different. Hadoop is presenting tremendous opportunities to evolve from traditional data warehousing to big data platforms where organizations can process all kinds of data.
At IBM, we’ve spent quite a bit of time looking at how customers are leveraging big data technologies. Five key use cases have emerged:
These use cases come to us from hundreds of engagements and interactions with clients of all sizes, in virtually every industry. Whether you intend to enhance the existing ecosystem with new data sources, leverage machine data, or gain efficiencies by leveraging Hadoop as an active archive, there are clear advantages to this technology.
But how easy is it to actually realize these advantages?
Well, if leveraging Hadoop was truly easy, implementation rates would have been higher than what we’ve seen over the last few years. The opportunity is there. The technology is there. So why isn’t every enterprise using Hadoop?
Pundits say 2013 will be the year when we move past the hype of big data and finally move into implementation mode. And the evidence seems to support this idea; a recent study from EMA Research suggests the number of companies that are implementing Hadoop projects in production is starting to push past 50 percent.
As it turns out, leveraging big data is not quite as easy as we originally thought. Open-source Hadoop does offer a file system—but it lacks the integration across the entire system that is necessary to simplify the adoption and consumption of big data in the enterprise.
Let’s face it: coming from the world of data warehousing, we are used to nicely integrated, easy-to-manage systems. Hadoop takes us back to the integration challenges we already overcame in traditional warehousing. Over the years, we’ve simplified data warehouses and made them much more agile. Today, we can deploy solutions in hours to satisfy even the most complex analytic requirements. These solutions are easy to maintain, easy to integrate, and over the years have required fewer resources than nonintegrated systems, thanks to autonomics.
Open-source Hadoop cannot deliver the integration we are used to because it is not complete. That’s one of the reasons IBM introduced IBM® InfoSphere® BigInsights™ a few years ago. We understood then that our customers needed more value on top of the distribution so that it could be more easily consumed in an enterprise.
On April 3, IBM announced several new innovations that address that exact goal. We made several key improvements to InfoSphere BigInsights—including the introduction of BigSQL, which offers a SQL-like interface for Hadoop—that are intended to help make big data more consumable.
My favorite Twitter quote from the Strata conference was, “Isn’t it ironic how the future of NoSQL has become SQL?” Actually, it’s not ironic at all—it’s just reality. The truth is that IT departments don’t have the luxury of retraining everyone who has mastered SQL.
We have taken simplicity even further, too. On April 3, IBM also announced its intention to release an IBM PureData™ System that is designed to ease adoption of Hadoop in the enterprise. The new system, PureData System for Hadoop, will be the latest member of the IBM PureSystems™ family that helps reduce the cost of IT and enables organizations to use different types of data services. The system will leverage InfoSphere BigInsights and deliver Hadoop with the simplicity of an appliance.
PureData System for Hadoop is designed around three key tenets:
PureData System for Hadoop will help enable the use cases being realized in the current market by simplifying the delivery of Hadoop. Check back for more details on these new solutions in the coming months.
In the meantime, if you have questions or feedback, please leave a note in the comments.
Open access broadens Hadoop analytics accessibility for InfoSphere BigInsights
IBM Redbook: IBM information governance solutions can enhance information control
IBM Redbook: Apply information governance principles and practices in a big data landscape
See a video series on the chief data officer and other data professionals at this Big Data & Analytics Hub blog
Discover a holistic approach to risk management
IBM Big Data in a Minute: Learn more about gaining human insights from data