Big Data and Warehousing, Integration and Governance

Big Data and the Evolving Role of Governance

How data governance leads and data stewards can embrace and extend the role of big data in their organizations

There is a lot of buzz about big data—so much, in fact, that some would even call it hype. Many organizations have appointed individuals to lead their data governance programs. In addition, these companies have business data stewards with deep subject-matter expertise who are accountable for data quality and business definitions. Although it is easy for data governance leads and data stewards to dismiss big data as a fad, they do so at their own peril because they must govern all data, regardless of its size.

These individuals must take three critical actions as their roles evolve over the next 24 months to accommodate big data governance:

  1. Identify big data that already exists within the organization
  2. Understand the impact of emerging regulations on big data and privacy
  3. Articulate the impact of data governance disciplines on big data

 

Identify big data that already exists within the organization

Most organizations already have big data—they just don’t know it. I spoke to an insurance carrier recently that was starting up its data governance program. Their team told me they wanted to focus on information such as customer phone numbers and addresses that had severe data quality issues. However, when I probed a bit further, I learned that they were considering a telematics pilot. As part of this pilot, the insurer was going to offer lower rates to policyholders in exchange for permission to put on-board sensors on vehicles that monitored drivers’ behavior on the road. For example, the insurer could offer a 20 percent discount on an auto insurance premium if the sensor showed that the policyholder drove no faster than 60 miles per hour.

The insurer anticipated that it would be overwhelmed with a large amount of data from the vehicle sensors, so it had to establish a policy regarding the retention period for telematics data. Other industries have a lot of other types of big data such as social media, clickstreams, and unstructured information. Data governance leads and data stewards need to get their arms around all of this data.

 

Understand the impact of emerging regulations on big data and privacy

A number of regulations are starting to address privacy concerns about the use of big data. For example, utility smart meters collect information about the use of electricity at intervals of an hour or less—which can be compiled to create detailed profiles of households. As a result, American public utility commissions and the Article 29 Data Protection Working Party in the European Union have rolled out legislation regarding the appropriate use of smart meter data by utilities. Data governance teams need to understand the impact of these regulations on a utility’s data management environment.

In the life sciences industry, the United States Food and Drug Administration has also issued detailed guidelines for specific processes companies must adopt when responding to public, unsolicited requests on social media for off-label information.

Banks also need to consider the implications of regulations such as the United States Fair Credit Reporting Act, which governs the type of information that can be used to make credit decisions on individuals using social media. If banks do so, it can be hard for them to later prove that they did not use prohibited information to make those decisions.

The list goes on and on. In many cases, the impact of these regulations is not always fully understood. Data governance leads and data stewards need to work with key stakeholders from the business, legal, and privacy areas to establish policies regarding the acceptable use of big data.

 

Articulate the impact of data governance disciplines on big data

The core data governance disciplines of data quality, metadata, privacy, and information lifecycle management also apply to big data. However, application of these principles works differently than it does with small data. For example, organizations that use tweets to conduct reputation analysis also need to consider whether the data set is truly representative of their customers:

  • Are disgruntled customers more likely to use Twitter?
  • Do younger customers use Twitter more often?
  • Are affluent customers more likely to use Twitter?

The social listening department at one high-end retailer had to address senior management’s concerns about whether Twitter users were in a different demographic from their traditional customers, who were primarily female, over 30, and with a household income over $100,000. The social listening department conducted marketing surveys and found, to their surprise, that the demographics of their Twitter users were actually very similar to their traditional customers. Armed with this survey data, the social listening department attracted more attention and a bigger budget from senior management.

Big data is here to stay. The use of big data is only going to become more pervasive within organizations. At the same time, more and more enterprises are appointing full-time data governance leads and data stewards. These companies need to embrace big data and extend it to derive maximum value and avoid being left behind.

Do you agree? Which issues do you see as most critical for governance leads and data stewards as they begin working with big data?

Previous post

Large-Scale Data Management in PureData/Netezza: Part 1

Next post

The Role of Stream Computing in Big Data Architectures

Sunil Soares

Sunil Soares is the founder and managing partner of Information Asset, LLC, a consulting firm that specializes in helping organizations build out their information governance programs. Prior to this role, Sunil was the Director of Information Governance at IBM, and worked with clients across six continents and multiple industries.

Sunil has published a book called The IBM Data Governance Unified Process that details the fourteen steps and almost one hundred sub-steps to implement an information governance program. The book is currently in its second print and has also been translated into Chinese.

Sunil’s second book, Selling Information Governance to the Business: Best Practices by Industry and Job Function, reviews the best way to approach information governance by industry and function.

His third book, Big Data Governance, will review the importance of information governance for different types of big data such as social media, machine-to-machine, big transaction data, biometrics, and human generated data. Sunil has also worked at the Financial Services Strategy Consulting Practice of Booz Allen & Hamilton in New York. He lives in New Jersey and holds an MBA in Finance and Marketing from the University of Chicago Booth School of Business.