Is Your Big Data Hot, Warm, or Cold?
In part one of this series, the focus was on developing a set of key performance indicators (KPIs) to better understand the temperature of data. Table 1 describes the generally accepted classifications used when discussing multi-temperature data or storage.Table 1: Multi-temperature data classifications
|Warm||Less frequently accessed|
This article provides a high-level view of how workload management (WLM) can both help and enforce the prioritization of queries based on business requirements and the temperature of the data being accessed.
Managing a multi-temperature workload
Let’s see if a picture is really worth a thousand words. AAs shown in Table 2, we can assume the following definitions for high, normal, and low priority as it they relates to the temperature of the data being accessed.Table 2: Definition of data temperature priorities
|High||1 to 3 months||Very few controls in place to manage queries. Concurrency controls in place as a way to throttle back if and when required. Thresholds used primarily as an alert mechanism.|
|Normal||4 to 9 months||Concurrency controls in place to ensure that high-priority queries are not impacted. Thresholds used both as alerts and as a way to abort queries.|
|Low||10 months or more||Very limited concurrency. Thresholds used to abort queries.|
The workflow in Figure 1 shows management of workloads by the age—or temperature—of the data. In this example, data that is less than 4 months old when queried is high priority; data that is between 4 and 10 months old is normal priority; and data that is 10 months and older is low priority.Figure 1: Multi-temperature workload management
Implementing concurrency controls and thresholds
Figure 2 shows concurrency controls that determine the number of concurrent queries that can be executed for a given priority.Figure 2: Implementing queues and thresholds in a workload management workflow.
If there are more queries than the number allowed to execute concurrently, those queries are put into a queue. When the number of queries concurrently executing drops below the concurrency level specified, the queries that were queued up are automatically released to execute. Figure 2 also shows that it is possible to define thresholds, while Table 3 lists a subset of the thresholds that can be implemented along with the actions that are available for each threshold. A complete list of available thresholds and descriptions can be found in the IBM® DB2® 10.1 Information Center for Linux, UNIX, and Windows in the section called “Control of work with thresholds.”Table 3: Potential workload management thresholds
|Threshold name||Available actions|
|Number of concurrent activities||Continue or stop execution|
|Number of rows returned||Continue or stop execution|
|Number of rows read||Continue or stop execution|
|Estimated SQL cost||Continue or stop execution|
|Unit of work time||Continue or stop execution or force application|
|Activity total time||Continue or stop execution|
|CPU time||Continue or stop execution|
|Amount of temporary space||Continue or stop execution|
|Amount of aggregate temporary space||Continue or stop execution|
|Connection idle time||Continue or stop execution|
|Number of concurrent connections||Continue or stop execution|
For those familiar with workload management, Figure 2 shows queuing implemented at the service class level. You can also implement queuing at other levels in DB2 such as the workload definition level. Using the countries listed in the article “Going Global with Data Mart Consolidation,” you can also define concurrency controls (and other WLM constructs) in which each country is an individual workload with its own set of concurrency controls. For simplicity, only the concurrency controls for China and Brazil are shown in Figure 3.Figure 3: Defining concurrency controls for China and Brazil.
Using multi-temperature workload management with storage groups
Storage groups are an excellent fit for the implementation and management of a WLM multi-temperature storage scheme. Figure 4 shows data that is range-partitioned in table spaces defined using the storage groups HOT, WARM, and COLD. Each is defined with a different type of storage.Figure 4: Storage group and table space configuration.
While the implementation details are not covered in this article, note that DB2 WLM can identify which table spaces any particular query is accessing and then act accordingly to prioritize and manage each query. The examples in Figures 1, 2, and 3 apply directly to the configuration depicted in Figure 4 from a workload management point of view.
Look for an upcoming article that demonstrates how to use SQL to implement a multi-temperature workload management solution based on business requirements. In the meantime, please feel free to share your experiences with multi-temperature workload management in the comments.