In any Database Partitioning Feature (DPF) warehouse environment, the replicated Materialized Query Table (MQT) plays an important role in improving database performance. The performance improvement can be significant for the frequently joined SQL statements between the bulky fact tables and tiny dimension tables. The improvement is achieved by assisting the joins, presenting the data locally instead of broadcasting the data across all the data partitions.
One common approach to avoid any data broadcast is to replicate the dimension tables across the data partitions. This approach facilitates co-location joins instead of non-co-location joins. A co-location join can only occur when certain criteria are met:
Let’s look at an example of how to convert a non-co-location join to a co-location, and how that conversion improves performance.
For this example, we used the following environment:
This code shows how to use MQT for best performance:
FROM DATAMARTS.FACT_CUST_PROFTBLTY F,
All the dimensional table DIM_CUSTOMER data were sent to all the data partitions using BTQ (Broadcast Table Queue—Q1), and no hashing was done to determine the receiving partitions (see Figure 1).
With this setup, changing the row relocation from BTQ to a simple co-location involves just a few easy steps. First, create table spaces TBS_REP_DATA and TBS_REP_INDEX in the same database partition group as in the fact table database partition group.
Next, replicate the table—make sure you have a unique index present on the table before replicating.
db2 +v ”
CREATE TABLE DATAMARTS.R_DIM_CUSTOMER AS
(SELECT * FROM DATAMARTS.DIM_CUSTOMER)
DATA INITIALLLY DEFERRED
REFRESH IMMEDIATE IN TBS_REP_DATA
INDEX IN TBS_REP_INDEX REPLICATED”
db2 +v “REFRESH TABLE DATAMARTS.R_DIM_CUSTOMER”
Then create any other custom indexes manually, since they will not be created automatically. You will not be able to create a unique index or the constraints on the replicated tables; all constraints should be created on the base table. Finally, perform REORG and RUNSTATS on the replicated table.
Upon creation of the replicated table, the explain plan illustrates the elimination of BTQ and shows instead a local co-located join.Figure 2. Data access plan when the MQT is available in the database
The total execution time was 23.8 times faster with MQT—an improvement of 95.8.
|Case||Total Query Cost (Timerons)||Execution Time (Seconds)||Improvement|
These results have been replicated with IBM customers. The UK-based consumer goods company Reckitt Benckiser was looking for a way to optimize data access plans on top of replication. With help from IBM, the company conducted a proof of concept that demonstrated a significant improvement in performance with the co-location approach compared with an earlier BTQ.
In assessing the existing database design, the IBM team identified several issues:
During the exercise, the IBM team identified five fact tables that were very important for the business reporting and had very similar, frequently joined columns.
With just a few steps, the IBM team was able to achieve another three percent gain in performance compared with the lab results reported above.
db2 +v ”
CREATE TABLE DATAMARTS.R2_DIM_CUSTOMER
AS (SELECT * FROM DATAMARTS.DIM_CUSTOMER)
DATA INITIALLLY DEFERRED
REFRESH IMMEDIATE IN TBS_REP2_DATA
INDEX IN TBS_REP2_INDEX REPLICATED”
db2 +v “REFRESH TABLE DATAMARTS.R2_DIM_CUSTOMER”
|Case||Execution Time (Seconds)||Improvement|
The DB2 DPF environment is a perfect scale-out solution for any database warehouse system. As the examples above demonstrate, the MQT approach can be very useful in improving the business reporting systems performance and ultimately delivering a better end-user experience.
Any questions? Let me know if you have any challenges with this approach.
Forrester report: Extract business value from social content
IBM white paper: Could your content be working harder—smarter?
And take advantage of open source InfoSphere Streams components
Podcast: Build a business case for real-time analytics
White paper: Deploy Hadoop to gain insights from mainframe data
Big data in a minute: Lighten the big data load