Databases

Bridging the Data Divide

Using reference data management to reduce cost, complexity, and business risk
Special thanks to Erik O’Neill, Rick Clements, and Anju Willard, who contributed to this article.

It starts simply enough. A U.S. company implements a business application that uses North American Industry Classification System (NAICS) code tables to categorize data. But when the company tries to integrate the new application, the IT department discovers that the company’s existing applications use Standard Industrial Classification System (SIC) codes instead. If the company goes forward with its plans to do business in Europe, it will need to use European classification codes as well.

From there, things can get complicated. Take the NACE codes, for example: the first four digits of each code, which denote the first four levels of the classification system, are the same in all European countries. However, the fifth digit varies from country to country. Further digits are often added by individual suppliers of applications and databases. And each application may have 10 or more different code tables.

 

Overcoming the challenges hidden in your applications

Consisting of codes and their corresponding descriptions, reference data is found in practically every enterprise application including back-end systems, front-end commerce applications, and the data warehouse. It has a major impact on everything from the integrity of business intelligence reports to the success or failure of system integration efforts.

Yet for something so consequential, it has a relatively low profile. Often, it is not until change happens due to a new regulation, a corporate merger, or the need to integrate previously separate systems that these code tables emerge as the linchpins they truly are. Mission-critical applications like product lifecycle management (PLM), enterprise resource planning (ERP), and customer relationship management (CRM) use reference data to classify and categorize business critical data such as product and customer information.

The underlying challenge is that code tables are typically defined and managed on an application-by-application basis, and the format and representation of the codes within the code tables differs within each application.

To move data from one application to another, or to look at data outside the context of its original application, IT staff must map those codes to a variety of different values and representations. This mapping allows mixing data from across various platforms to perform operations such as responding to a business intelligence (BI) query.

 

Dealing with change and risk

Another challenge is that reference data will need to be updated at some point as external standards bodies provide periodic changes. Without an automated process for managing changes to related code tables and mappings across the enterprise, IT may need to update each application manually—a time-consuming, costly, and error-prone activity. Manual processes also make it difficult to create an audit trail, provide a standard review process, or implement consistent security policies, which presents regulatory risk.

With applications and systems located in different parts of the enterprise, lack of centralized management means reference data is managed in multiple places, which often results in duplication and introduces added potential for error. In fact, there may be no process for agreeing on changes to reference data across the business. Ownership of the process may be unclear, with limited business involvement and IT simply performing ad hoc, application-specific code table maintenance.

 

Looking for a less-costly approach

Organizations looking for a more efficient, less-costly, and lower-risk approach to these challenges are considering reference data management (RDM) solutions. RDM allows different versions of codes to be managed from a central point, simplifies the creation of mappings between the different versions, and enables transcoding of values across data sets. Cross-enterprise reference data can then be reconciled for application integration, business process integration, statutory and non-statutory reporting, compliance, and BI analysis.

RDM works by treating reference data as a special type of master data and applying proven master data management (MDM) techniques including governance, security, and audit control. The RDM solution enables efficient management of the complex mappings between different reference data representations across the enterprise and coordinates the use of reference data standards within the organization. An RDM hub accessible by user interface provides a centralized authoring and approval process, publishes data changes to enterprise systems, and performs exception handling (see Figure 1).

Figure 1: Role of the relational data management (RDM) hub.

 

Replacing scattered processes with a centralized hub

With centralized management of reference data, organizations can benefit in multiple ways. They gain the auditability, provenance, and security of reference data that is required for external reporting and regulatory compliance. The chance of costly errors associated with manual, ad hoc processes—for example, someone loading the wrong version of a reference data spreadsheet or file into a process—is greatly reduced. Making changes to reference data and distributing those changes to applications that consume the data is faster and more efficient.

Healthcare organizations offer a case in point. The healthcare industry has many of the same reference data management issues as other industries. However, in addition to country codes, state codes, gender codes, and others common in business, the healthcare industry has its own specific medical code sets and terminologies that require ongoing changes and reference data management. With RDM, for example, a healthcare organization can easily align different vocabularies for entities such as the Diagnostic and Statistical Manual (DSM) and the International Classification of Diseases (ICD), or facilitate the government-mandated changeover from ICD version 9 to version 10.

 

Competing in a data-driven world

IBM is a thought leader on data governance and was one of the first to see the potential benefits of applying a master data management approach to reference data. Built on IBM’s proven InfoSphere Master Data Management platform, the IBM InfoSphere MDM Reference Data Management solution is designed to put management of reference data into the hands of business users. An application subscription model provides control over how data is on-boarded and distributed to subscribing applications. Intuitive user interfaces and workflow processes help simplify review, approval, and publishing of reference data changes.

The result is a reduced burden on IT and improvement in the overall quality of data used across the organization. Ultimately, an effective RDM solution means better products or services for customers and an organization that is better able to compete in today’s data-rich environment.

Previous post

Exploring the Origins of IBM DB2 Analytics Accelerator

Next post

New DB2 Developer's Guide

Chris Young

Chris Young, whose career spans 25 years with high tech manufacturing and marketing companies, has written for a wide variety of technology publications.

  • Pingback: Managing Reference Data | Mastering Data Management

  • http://www.softwareag.com/corporate/products/wm/mdm/overview/default.asp Nancy Beckman

    “With applications and systems located in different parts of the enterprise, lack of centralized management means reference data is managed in multiple places, which often results in duplication and introduces added potential for error. ”

    You hit the nail on the head. And what happens when data is duplicated wrong? Different parts of your enterprise are running on different assumptions, which only compounds the problem and makes it even more complicated to undo.