To date, only a few visionary users have begun to connect Hadoop directly to their companies’ business intelligence (BI) strategies. However, their numbers will certainly grow as Hadoop matures into a more robust platform for traditional operational BI applications, while continuing to serve core applications in advanced analytics.
It’s not far-fetched to start thinking of Hadoop platforms, such as IBM InfoSphere BigInsights, as the nucleus of your next-generation enterprise data warehouse (EDW). Likewise, you should view Hadoop’s primary user and developers—data scientists—as the harbingers of the next generation of BI users: business analysts who demand the ability to explore an order-of-magnitude more data and to build, score, and deploy more complex analytic models in a growing range of mission-critical applications. The new generation of knowledge workers insists on having their own “sandboxes” of deep data to explore as they see fit.
Even by the end of this decade, the majority of traditional BI requirements will be adequately served by traditional BI tools and supporting EDWs, cubes, marts, and other analytic databases such as IBM Netezza. Even in the year 2020, your primary Hadoop uses may be for operational applications that are either outside the traditional BI realm, such as customer experience management and marketing campaign optimization, or in a supporting role, such as unstructured content transformation.
But Hadoop will almost certainly creep into a broader range of leading-edge BI requirements—especially those that involve statistical analysis, predictive analytics, or natural language processing. You’ll probably also incorporate other big data tools and platforms—such as massively parallel in-memory, document, and graph databases—into hybrid environments where Hadoop plays an important (but not paramount) role.
Clearly, the definition of “traditional BI” will continue to grow as more big data-centric approaches come into the mainstream. People often use the term “business analytics” to allude to the growing functional scope of mainstream BI tools.
Several types of emerging requirements will drive adoption of Hadoop and kindred approaches throughout the rest of this decade, with traditional business analytics—i.e., BI that provides decision support—as a key focus:
Clearly, we’re well beyond old-school BI here, but the world has evolved into the age of petabyte-scale advanced analytics at our fingertips. A new BI future is rapidly emerging.
However, that doesn’t mean you need to jettison established tools and approaches if they’re still addressing your organization’s needs. If your needs in these areas are specialized and demand the full arsenal of the professional data scientist, you will almost certainly need to use power tools such as IBM SPSS. But if you need basic features in any or all of these areas that work out of the box with your reporting, query, and other traditional BI tools, the next-generation Hadoop-enabled BI platform is for you.
And let’s not forget the simplicity equation, without which big data may very well become a haystack for buried (albeit golden) nuggets of intelligence. One potential downside of big data is that the sheer volume, velocity, and variety of data can easily overwhelm the poor analyst who is trying to find an actionable kernel of intelligence. Humans can’t easily navigate petabytes, and information overload is always a very real risk when you’re simply dumping data indiscriminately into your Hadoop clusters.
As you implement the next-generation of Hadoop-enabled BI environments, you must take pains to ensure a simple, seamless, and productive experience for the average knowledge worker. Line-of-business users will balk at big data if you don’t deliver targeted intelligence to their tablets, smartphones, and other devices for fast consumption.
Many of the usability features of today’s top BI platforms, such as IBM Cognos, will be fundamental to this new era. The new era of Hadoop-centric BI will rely on self-service, in-memory, predictive, and portable, and personalizable client tools. The emphasis will be on interactive visualization, semantic search, and data virtualization to ensure simple but rich exploratory experiences.
Don’t worry. Your average user won’t need to learn how to program in MapReduce, Pig, or any of the other Hadoop specifications. All of this big data “plumbing” will be submerged in a highly visual next-generation BI experience similar to what you’ve grown accustomed to on Cognos and other analytics tools. And it’s a fair bet that your next-generation BI platform will come with productivity accelerators: in other words, embedded MapReduce and other big data models, views, and tools geared to common analytical needs.
The next-generation Hadoop-enabled BI platform will also be extensible. These environments will also support collaborative development of MapReduce and other analytic models by data scientists, business analysts, and other knowledge workers working in social collaboration contexts. Developer productivity will grow as next-generation big data platforms automate more of the grunt work of data discovery, preparation, aggregation, segmentation, modeling, and scoring.
In other words, Hadoop and other new big data technologies will be the foundation for an evolved massively parallel processing (MPP) EDW, not dissimilar from IBM Smart Analytics System or IBM Netezza.
Taken together, all of the new approaches that we’ve discussed are transforming your BI environment, now and through the rest of this decade, into an ever more powerful brilliance infrastructure.
Visit the Hadoop Dev site for information on Apache Hadoop and InfoSphere BigInsights
Blog: IBM is expanding its Hadoop commitment
Case study: See how data governance can enhance employee autonomy
IBM Press book: Discover how IBM realizes value from big data and analytics
Infographic: IBM Insight 2014 by the numbers
Big data in a minute: The composable business