Big Data and Warehousing

Is Your Big Data Hot, Warm, or Cold?

Multi-temperature workload management in pictures

In part one of this series, the focus was on developing a set of key performance indicators (KPIs) to better understand the “temperature” of your data. Table 1 describes the generally accepted classifications used when discussing multi-temperature data or storage.

Classification Description
Hot Frequently accessed
Warm Less frequently accessed
Cold Rarely accessed
Table 1: Multi-temperature data classifications

 

This article provides a high-level view of how workload management (WLM) can both help and enforce the prioritization of queries based on business requirements and the “temperature” of the data being accessed.

 

Managing a multi-temperature workload

Let’s see if a picture is really worth a thousand words. AAs shown in Table 2, we can assume the following definitions for high, normal, and low priority as it they relates to the temperature of the data being accessed.

Priority Data temperature Description
High 1 to 3 months Very few controls in place to manage queries. Concurrency controls in place as a way to throttle back if and when required. Thresholds used primarily as an alert mechanism.
Normal 4 to 9 months Concurrency controls in place to ensure that high-priority queries are not impacted. Thresholds used both as alerts and as a way to abort queries.
Low 10 months or more Very limited concurrency. Thresholds used to abort queries.
Table 2: Definition of data temperature priorities

 

The workflow in Figure 1 shows management of workloads by the age (or temperature) of the data. In this example, data that is less than 4 months old when queried is high priority; data that is between 4 and 10 months old is normal priority; and data that is 10 months and older is low priority.

Figure 1: Multi-temperature workload management

 

Implementing concurrency controls and thresholds

Figure 2 shows concurrency controls that determine the number of concurrent queries that can be executed for a given priority.

Figure 2: Implementing queues and thresholds in a workload management workflow.

 

If there are more queries than the number allowed to execute concurrently, those queries are put into a queue. When the number of queries concurrently executing drops below the concurrency level specified, the queries that were queued up are automatically released to execute. Figure 2 also shows that it is possible to define thresholds, while Table 3 lists a subset of the thresholds that can be implemented along with the actions that are available for each threshold. A complete list of available thresholds and descriptions can be found in the IBM® DB2® 10.1 Information Center for Linux, UNIX, and Windows in the section called “Control of work with thresholds.”

Threshold name Available actions
Number of concurrent activities Continue or stop execution
Number of rows returned Continue or stop execution
Number of rows read Continue or stop execution
Estimated SQL cost Continue or stop execution
Unit of work time Continue or stop execution or force application
Activity total time Continue or stop execution
CPU time Continue or stop execution
Amount of temporary space Continue or stop execution
Amount of aggregate temporary space Continue or stop execution
Connection idle time Continue or stop execution
Number of concurrent connections Continue or stop execution
Table 3: Potential workload management thresholds

 

Going global

For those familiar with workload management, Figure 2 shows queuing implemented at the service class level. You can also implement queuing at other levels in DB2 such as the workload definition level. Using the countries listed in the article “Going Global with Data Mart Consolidation,”  you can also define concurrency controls (and other WLM constructs) in which each country is an individual workload with its own set of concurrency controls. For simplicity, only the concurrency controls for China and Brazil are shown in Figure 3.

Figure 3: Defining concurrency controls for China and Brazil.

 

Using multi-temperature workload management with storage groups

Storage groups are an excellent fit for the implementation and management of a WLM multi-temperature storage scheme. Figure 4 shows data that is range-partitioned in table spaces defined using the storage groups HOT, WARM, and COLD. Each is defined with a different type of storage.

Figure 4: Storage group and table space configuration.

 

While the implementation details are not covered in this article, note that DB2 WLM can identify which table spaces any particular query is accessing and then act accordingly to prioritize and manage each query. The examples in Figures 1, 2, and 3 apply directly to the configuration depicted in Figure 4 from a workload management point of view.

In my next article, I will demonstrate how to use SQL to implement a multi-temperature workload management solution based on business requirements. In the meantime, please feel free to share your experiences with multi-temperature workload management in the comments.

Previous post

IBM Smarter Analytics Signature Solutions

Next post

Netezza Migration Kung-Fu: Part 2

Dan Gibson

From his position as the DB2 Lab Advocate at the Atlanta, Nagano, and Sydney Olympic Games to his Master of Science Degree in Computer Science, Dan knows technology—specifically DB2—inside and out. Since joining IBM over a decade ago, Dan has dedicated his career to diving in with DB2.

  • Rodolfo Neuhaus

    Very good and very understandable.