By Leon Katsnelson
By Susan Visser
By Bernie Spang
By the DB2 Guys
By Fred Ho
By Louis T. Cherian
By Shweta Shandilya
By Lawrence Weber
By Serge Rielau
By Dwaine Snow
A multi-temperature data management solution refers in part to having data that is frequently accessed on fast storage (hot data), compared to less-frequently accessed data stored on slightly slower storage (warm data) and rarely accessed data stored on the slowest storage an organization has (cold data). Developing this solution requires a set of key performance indicators (KPIs) to take your data’s “temperature” and to help make both operational and business decisions involving your data. To demonstrate this, assume that your data is segmented and managed as follows:
While it may be reasonable to assume that two-year-old data is less frequently accessed than data that is 90 days old, understanding the frequency of access and change in greater detail can trigger other business decisions. For example, if a month that has had no changes for the past six months had 100 rows changed in an extract, transform, and load (ETL) cycle, you may not want to take any action. Alternatively, if for that same month 10,000 rows changed in an ETL cycle, you might consider one or more of the following actions:
Identifying what data changed, along with the volume and frequency of change, can provide valuable input for operational and business decisions. This article shares some of the available metrics that can help you understand the frequency, volume, percentage of change, and type of actions that constitute changes to your data.
Figure 1 shows a bar graph representation of a table where the months December, November, October, September, August, and July have a higher rate of change and are more heavily accessed than data for the months June, May, April, March, February, and January.Figure 1: Frequency of access versus frequency of change
When the database is activated, the “always-on” metrics in DB2 allow quick and easy reporting of metrics, which can then be used to develop a business view of data access patterns and changed data activity. These metrics are available from the time the database was last activated and can be stored in (user-defined) tables for further analysis.
Figure 2 lists the key metrics that are available using the table function MON_GET_TABLE for each table and each range partition of a table:
These metrics enable you to answer the following questions:
While index metrics do not provide input into data temperature, they still help complete the picture of your data by interpreting index utilization and index performance using the table function MON_GET_INDEX. A subset of those metrics is listed in Figure 3:
Data that is not accessed for an extended period of time is often referred to as “dormant data”—for example:
The system catalog column named LASTUSED is asynchronously updated to store the last date that an object was used or referenced. This information is available for indexes, packages, tables, each range partition of a table, and MQTs. By capturing this information over time, you can identify how long any particular database object has not been accessed.
There are other scenarios for which the LASTUSED column also provides value, such as the dropping of indexes that are not being accessed after a database migration. The following system catalog views contain the LASTUSED column:
In the article “Going Global with Data Mart Consolidation—Part One: Using Row and Column Security to Give Local Views and Correct Currency Aggregation,” database roles were defined for each country referred to in the article as shown in Table 1:
|United Arab Emirates||UNITED_ARAB_EMIRATES_ROLE|
If you defined a workload for each role (country), you could provide many of the same metrics reported at the table level using the table function MON_GET_TABLE—but instead of being reported from an operational point of view, they would be reported from a business point of view. Figure 4 lists some of the key metrics that become immediately available using the table function MON_GET_WORKLOAD once a workload definition is created:
These metrics enable you to answer some of the following business considerations:
To access these metrics, you create a workload definition and then grant use of the workload. Figure 5 shows a sample created workload named HONG_KONG_ ANALYTICS_WORKLOAD for users who are executing an application named ‘Market_Basket_Analysis’ and who have been granted the role ‘HONG_KONG_ROLE’:
SESSION USER_ROLE (‘HONG_KONG_ROLE’)
GRANT USAGE ON WORKLOAD HONG_KONG_ MARKETING_WORKLOAD
TO ROLE HONG_KONG_ROLE
Figure 5: Workload definition for Hong Kong Marketing
After the execution of these two statements (and assuming workload definition selection), metrics will become available for Hong Kong users executing the marketing application.
The same workload metrics for business applications can be provided for ETL. For example, you could define an ETL workload for each country’s ETL. In addition, by identifying each ETL job uniquely, you could return the same metrics listed in Figure 4 not only by country but also by ETL job.
Figure 6 lists the table functions for gathering workload metrics.
Workload metrics help remove the “noise” when it is necessary to report on business-related activity only. For example, Refresh Deferred MQT maintenance contributes to the metric “rows read” for any table queried as part of maintaining an MQT. With workload metrics, you can filter out such activity to get a better approximation of “business query” activity against a table.
A multi-temperature data warehouse solution is based on business requirements, workload management, and intelligent storage provisioning. As shown here, users have a variety of reporting capabilities and metrics available to help understand the utilization of tables, table spaces, and workloads as they relate to the frequency of access and change of data.
DB2 TechTalk: Deep Dive on BLU Acceleration in DB2 10.5, Super Analytics Super Easy
Thursday, May 30: 12:30 – 2:00 PM ET
Informix Chat with the Lab: Primary Storage Manager (PSM) a Parallel Backup Alternative to Ontape
Thursday, May 30: 11:30 – 1 PM ET
Big Data Seminar 2013, Featuring Krish Krishnan
June 14 in New York City
marcus evans Pharma Data Analytics Conference
July 10-11 in Philadelphia
IBM Smarter Content Summit 2013
Big Data at the Speed of Business
Broadcast event replay now available
Information on Demand 2013: Early Bird Registration Now Open
November 3-7 in Las Vegas