Databases

How to Improve Performance in Warehouse Systems Using Replicated MQTs

Simple steps for substantially reducing execution time

In any Database Partitioning Feature (DPF) warehouse environment, the replicated Materialized Query Table (MQT) plays an important role in improving database performance. The performance improvement can be significant for the frequently joined SQL statements between the bulky fact tables and tiny dimension tables. The improvement is achieved by assisting the joins, presenting the data locally instead of broadcasting the data across all the data partitions.

One common approach to avoid any data broadcast is to replicate the dimension tables across the data partitions. This approach facilitates co-location joins instead of non-co-location joins. A co-location join can only occur when certain criteria are met:

  1. Both joining tables are in the same database partition group.
  2. The distribution keys for both tables have same number of columns and compatible data types.
  3. The distribution key columns are partition-compatible.

Let’s look at an example of how to convert a non-co-location join to a co-location, and how that conversion improves performance.

 

Enhancing performance with a few easy steps

For this example, we used the following environment:

  • IBM InfoSphere Balanced Warehouse D5100 ran IBM DB2 9.7 FP5 on SUSE Linux Enterprise Server (SLES) 10.
  • 11 physical servers were equipped with four 2800 MHz CPUs and 32 GB of memory.
  • The database included 41 data partitions (40 data partitions and one coordinator partition).
  • The fact table included 508 million records spread over 40 data partitions.
  • The dimension table included 111,146 records in a single admin node coordinator partition.
  • The Query Optimization DB CFG DFT_QUERYOPT was set to 2.

This code shows how to use MQT for best performance:

SELECT F.CUST_KEY,

F.PROD_SKU_KEY,

F.BUSS_UNIT_KEY,

F.UNIT_YTD_VOLUME,

D.CUST_NAME,

D.CUST_STRATEGIC_DIR_CODE,

D.TRD_TY_CODE

FROM DATAMARTS.FACT_CUST_PROFTBLTY F,

DATAMARTS.DIM_CUSTOMER D

WHERE F.CUST_KEY=D.CUST_KEY

Figure 1. Data access plan when the MQT is not available in the database

 

All the dimensional table DIM_CUSTOMER data were sent to all the data partitions using BTQ (Broadcast Table Queue—Q1), and no hashing was done to determine the receiving partitions (see Figure 1).

With this setup, changing the row relocation from BTQ to a simple co-location involves just a few easy steps. First, create table spaces TBS_REP_DATA and TBS_REP_INDEX in the same database partition group as in the fact table database partition group.

Next, replicate the table—make sure you have a unique index present on the table before replicating.

db2 +v ”

CREATE TABLE DATAMARTS.R_DIM_CUSTOMER AS

(SELECT * FROM DATAMARTS.DIM_CUSTOMER)

DATA INITIALLLY DEFERRED

REFRESH IMMEDIATE IN TBS_REP_DATA

INDEX IN TBS_REP_INDEX REPLICATED”

 

db2 +v “REFRESH TABLE DATAMARTS.R_DIM_CUSTOMER”

Then create any other custom indexes manually, since they will not be created automatically. You will not be able to create a unique index or the constraints on the replicated tables; all constraints should be created on the base table. Finally, perform REORG and RUNSTATS on the replicated table.

Upon creation of the replicated table, the explain plan illustrates the elimination of BTQ and shows instead a local co-located join.

Figure 2. Data access plan when the MQT is available in the database

 

The total execution time was 23.8 times faster with MQT—an improvement of 95.8.
 
Table 1. Execution time and total query cost results with and without MQT

Case Total Query Cost (Timerons) Execution Time (Seconds) Improvement
Without MQT 1.71877e+06 1168
With MQT 175576 49 95.8%

 

Boosting performance further with an IBM customer

These results have been replicated with IBM customers. The UK-based consumer goods company Reckitt Benckiser was looking for a way to optimize data access plans on top of replication. With help from IBM, the company conducted a proof of concept that demonstrated a significant improvement in performance with the co-location approach compared with an earlier BTQ.

In assessing the existing database design, the IBM team identified several issues:

  • There were 77 bulky fact tables created in a single database partition group, spread across 40 data partitions. The replicated tables were also in the same database partition group.
  • Since there could be only one distribution map, it was very difficult to consistently achieve very minimal data and processing skews in all the fact tables.

During the exercise, the IBM team identified five fact tables that were very important for the business reporting and had very similar, frequently joined columns.

With just a few steps, the IBM team was able to achieve another three percent gain in performance compared with the lab results reported above.

  1. The team created a separate database partition group and a tablespace for this very important set of tables (five of them in this case).
  2. The team then copied the data across the new tablespace and renamed the old tables. This step limited the data skew from 12 percent to less than 3 percent. The limitation was around the replicated MQT DATAMARTS.R_DIM_CUSTOMER. The DB2 optimizer cannot evaluate the option of using it to make a co-located join because we used two different database partition groups.
  3. To overcome the limitation, the IBM team created one more set of tablespaces in the new database partition group for the second set of replicated MQT and the replication on top of it.

db2 +v ”

CREATE TABLE DATAMARTS.R2_DIM_CUSTOMER

AS (SELECT * FROM DATAMARTS.DIM_CUSTOMER)

DATA INITIALLLY DEFERRED

REFRESH IMMEDIATE IN TBS_REP2_DATA

INDEX IN TBS_REP2_INDEX REPLICATED”

db2 +v “REFRESH TABLE DATAMARTS.R2_DIM_CUSTOMER”

 
Table 2. Execution time results with and without MQT

Case Execution Time (Seconds) Improvement
Without MQT 1168
With MQT2 23 98.03%

The DB2 DPF environment is a perfect scale-out solution for any database warehouse system. As the examples above demonstrate, the MQT approach can be very useful in improving the business reporting systems performance and ultimately delivering a better end-user experience.

Any questions? Let me know if you have any challenges with this approach.

Previous post

Large-Scale Data Management in PureData/Netezza: Part 2

Next post

Large-Scale Data Management in PureData/Netezza: Part 3

Mohankumar Saraswatipura

IBM Champion Mohankumar Saraswatipura started his career in DB2 8.1. He has worked as a database application developer and a performance consultant in High Performance On Demand Solutions (HiPODS) for IBM Software Labs, India. He is currently working as a Lead DB2 Database Administrator, helping application developers on Balanced Warehouse D5100, Siebel 8, and SAP ECC implementations. He is experienced in DB2 application performance tuning and database design. He is an IBM-certified application developer, IBM InfoSphere solution designer, XML solution developer, and advanced DB2 database administrator. Mohan is an active member of IDUG India. He completed his Master of Technology degree in computer science in 2004 and received his Executive MBA from IIMC in 2007.

  • Fanche

    Excellent article indeed. Dou you have any feedback of that tunning tip in an SAP BW environnement ? What we observe in the DB2 DPF behind our SAP BW (Finance) looks very much alike your exemple. But SAP does not seem to support MQTs, thus my question…Regards

  • Nishant

    Excellent work on Replicated MQT! The second workaround to improve performance was really good. I wish optimizer could pick MQTs from different partition groups.

  • Girish

    Fantastic article and a wonderful way of making use of two database partition groups along with MQT. Very well done, keep it up!!

  • Nishant

    Excellent! Very well explained and other DBAs can use this information in their DPF environments. Good work done!

    • mrinal

      Nice One…