There is a lot of buzz about big data—so much, in fact, that some would even call it hype. Many organizations have appointed individuals to lead their data governance programs. In addition, these companies have business data stewards with deep subject-matter expertise who are accountable for data quality and business definitions. Although it is easy for data governance leads and data stewards to dismiss big data as a fad, they do so at their own peril because they must govern all data, regardless of its size.
These individuals must take three critical actions as their roles evolve over the next 24 months to accommodate big data governance:
Most organizations already have big data—they just don’t know it. I spoke to an insurance carrier recently that was starting up its data governance program. Their team told me they wanted to focus on information such as customer phone numbers and addresses that had severe data quality issues. However, when I probed a bit further, I learned that they were considering a telematics pilot. As part of this pilot, the insurer was going to offer lower rates to policyholders in exchange for permission to put on-board sensors on vehicles that monitored drivers’ behavior on the road. For example, the insurer could offer a 20 percent discount on an auto insurance premium if the sensor showed that the policyholder drove no faster than 60 miles per hour.
The insurer anticipated that it would be overwhelmed with a large amount of data from the vehicle sensors, so it had to establish a policy regarding the retention period for telematics data. Other industries have a lot of other types of big data such as social media, clickstreams, and unstructured information. Data governance leads and data stewards need to get their arms around all of this data.
A number of regulations are starting to address privacy concerns about the use of big data. For example, utility smart meters collect information about the use of electricity at intervals of an hour or less—which can be compiled to create detailed profiles of households. As a result, American public utility commissions and the Article 29 Data Protection Working Party in the European Union have rolled out legislation regarding the appropriate use of smart meter data by utilities. Data governance teams need to understand the impact of these regulations on a utility’s data management environment.
In the life sciences industry, the United States Food and Drug Administration has also issued detailed guidelines for specific processes companies must adopt when responding to public, unsolicited requests on social media for off-label information.
Banks also need to consider the implications of regulations such as the United States Fair Credit Reporting Act, which governs the type of information that can be used to make credit decisions on individuals using social media. If banks do so, it can be hard for them to later prove that they did not use prohibited information to make those decisions.
The list goes on and on. In many cases, the impact of these regulations is not always fully understood. Data governance leads and data stewards need to work with key stakeholders from the business, legal, and privacy areas to establish policies regarding the acceptable use of big data.
The core data governance disciplines of data quality, metadata, privacy, and information lifecycle management also apply to big data. However, application of these principles works differently than it does with small data. For example, organizations that use tweets to conduct reputation analysis also need to consider whether the data set is truly representative of their customers:
The social listening department at one high-end retailer had to address senior management’s concerns about whether Twitter users were in a different demographic from their traditional customers, who were primarily female, over 30, and with a household income over $100,000. The social listening department conducted marketing surveys and found, to their surprise, that the demographics of their Twitter users were actually very similar to their traditional customers. Armed with this survey data, the social listening department attracted more attention and a bigger budget from senior management.
Big data is here to stay. The use of big data is only going to become more pervasive within organizations. At the same time, more and more enterprises are appointing full-time data governance leads and data stewards. These companies need to embrace big data and extend it to derive maximum value and avoid being left behind.
Do you agree? Which issues do you see as most critical for governance leads and data stewards as they begin working with big data?
Open access broadens Hadoop analytics accessibility for InfoSphere BigInsights
IBM Redbook: IBM information governance solutions can enhance information control
IBM Redbook: Apply information governance principles and practices in a big data landscape
See a video series on the chief data officer and other data professionals at this Big Data & Analytics Hub blog
Discover a holistic approach to risk management
IBM Big Data in a Minute: Learn more about gaining human insights from data