Markets today are abuzz with news, anecdotes, and rumors of the purported omnipresence and omniscience of big data. While marketers are busy formulating ways to monetize the vastness of available zettabytes, data scientists the world over are burning the midnight oil to harness new technologies (like streaming, Hadoop, and other NoSQL stores), commodity hardware, and cloud computing to literally transform the world.
Organizations see these technologies as game changers, especially since several of them support data in its native format without the need for transformation or modeling before questions can be asked. At this point in the big data lifecycle, organizations do not always know which data sources have value and they do not necessarily want to invest vast resources to gather requirements and sponsor formal information governance programs.
Clearly as the exploratory phase of big data “skunk works” projects drives business value and leads to formal initiatives, organizations will turn their attention to the fundamental questions within information management:
All of this hoopla about big data raises more questions for the CIO than he or she may be prepared for. It is our experience that many organizations are justifying the lack of adequate governance policies because they believe that big data is “different” somehow, which we believe is side-stepping the issue. Simply stated, as big data technologies become operational—as opposed to exploratory—they need the same governance disciplines as traditional approaches to data management.
One of the first steps in the implementation of an information governance program is to assess the current state and the desired future state of maturity. According to Banu Ekiz, Vice President of Business Intelligence, Akbank Information Technologies Turkey, “Big data has all the characteristics of ‘small data’ when it comes to governance. The only difference is in the complexity and variety of channels that it comes from. Although there are greater demands on organizational energy and resources to govern big data, the gain in terms of business value is that much higher. Being able to analyze big data from the web, and take necessary actions, can have a major impact on a company’s profit. A maturity model for big data governance is a critical first step in this journey.”
We have leveraged the eleven categories of the IBM Information Governance Council Maturity Model (see figure). The following is a sample set of questions to assess the maturity of big data governance:
According to Nina Vredevoogd, Manager, IT Planning & Program Management – Concur Technologies, “Big data is global. The concept of privacy, laws and regulations around data are not. For global companies, developing a comprehensive information management program and policies governing big data is imperative. Consumers are becoming more concerned about online privacy. Companies that adopt and actively market responsible policies to control access to consumer data, will likely gain competitive advantage in the rapidly growing world of online commerce.”
According to Jay Yusko, Ph.D, and Vice President of Technology Research at SymphonyIRI Group, “Information governance for big data is an absolute necessity. By its very nature, big data is developed from many disparate sources that need to be integrated to be useful as information that can be analyzed. To make this integration possible, the data from all the different sources need to be standardized with the same set of rules and then validated and monitored. This is really the heart of the information governance program for big data.”
In summary, organizations need to treat big data as an enterprise asset similar to other data types. As a general rule of thumb, if it is a governance consideration with a database or warehouse, then it is a governance consideration with big data technologies as well.
What do you think? Let us know in the comments.
Forrester report: Extract business value from social content
IBM white paper: Could your content be working harder—smarter?
And take advantage of open source InfoSphere Streams components
Podcast: Build a business case for real-time analytics
White paper: Deploy Hadoop to gain insights from mainframe data
Big data in a minute: Lighten the big data load