Big Data and Warehousing

5 Steps for Migrating Data to IBM DB2 with BLU Acceleration

Easily optimize data from an existing data warehouse for big data analytics

In the dynamics of today’s business world the success of any organization depends largely on organizational decisions made at the appropriate times. And a reliable data warehouse environment plays a pivotal role in helping drive the business decisions that can move an organization toward successful outcomes.

IBM® DB2® 10.5 with BLU Acceleration offers an innovative database and highly advanced technology for business intelligence (BI) applications that is designed to provide cost-effective, optimized performance and ease of use in the following ways:

  • Performance: DB2 10.5 with BLU Acceleration provides a fit-for-purpose, next-generation database for big data analytics.
  • Simplicity: DB2 10.5 with BLU Acceleration creates no complexities around the database design; simple tables can be created and data can be loaded into them with the workload set to ANALYTICS.
  • Cost savings: DB2 10.5 with BLU Acceleration offers a tremendous level of compression that helps significantly reduce storage and costs for backup media.

Migrating data from an existing warehouse to DB2 10.5 with BLU Acceleration can be handled with ease and provide several benefits at the same time.

 

Easy migration setup

The data migration task is easy to implement in DB2 10.5 with BLU Acceleration. The only requirement is to have an understanding of the application to be able to decide whether to convert the tables from row organize to column organize (see Figure 1).

 

Schema Purpose and Activity   Table Organization Decision
STAGING Store data extracts as is from an enterprise resource planning (ERP) application and other source systems. The month-end activity extensively uses the Range Partitioning DETACH, LOAD, and ATTACHES processes. No conversion of the table from row organize to column organize to avoid making changes to the month-end processes.
 
DATASTORE Intermediate transformed data and mostly tables with no Multi Dimensional Clustering (MDC) and Range Partitioning features. Convert tables from row organize to column organize.
 
DATAMARTS Presentation layer tables with no additional table features, except materialized query tables (MQTs), facilitate colocation joins. Drop MQTs and convert base tables from row organize to column organize.

Figure 1. Decision making schema for table conversion based on the intended application

 
For this performance case study between DB2 10.5 with BLU Acceleration and the existing software stack, the source system has a processor that is 33.6 times faster than the target DB2 10.5 with BLU Acceleration test server. The source system also has 16 times more memory than the target system.1

The following five steps can be used to perform the data migration task:

1. Set the workload and create a 32 KB page size database.

db2set DB2_WORKLOAD=ANALYTICS

Note: The ANALYTICS workload setting will automatically set INTRA_PARALLEL (DBM) to YES, DFT_DEGREE (DB) to ANY, and DFT_TABLE_ORG (DB) to COLUMN.

2. Extract the table definition from the source system, and create the similar table in the target with an additional ORGANIZE BY ROW clause in the create table statement—without any constraints and secondary indexes.

3. Load data into the DB2 10.5 with BLU Acceleration database tables using the OS pipe, or using flat files.

4. Convert the data and index table spaces from DMS to AUTOMATIC STORAGE, if any.

db2 “ALTER TABLESPACE MANAGED BY AUTOMATIC STORAGE”

5. Convert the tables from row organization to column organization using the db2convert utility:

db2convert DATASTORE.GL_PRI_AGG_DS
Table RowsNum InitSize (MB) FinalSize (MB) CompRate (%) State
——————————————————————————————-“DATASTORE”.”GL_PRI_AGG_DS” 24766845 2290.00 1100.12 51.96 CompletedPre-Conversion Size (MB): 2290.00
Post-Conversion Size (MB): 1100.12
Compression Rate (Percent): 51.96

 

Data compression and relative performance results

The IBM DB2 10.5 with BLU Acceleration database tables were compressed by 83 percent of the base uncompressed data—no index—on earlier versions of DB2.2 The storage-savings graph shown in Figure 2 provides an eye-opening comparison—but the best part is yet to come.

 
saraswatipura_fig2

Figure 2. Substantially increased compression for DB2 10.5 tables compared with earlier DB2 versions

 
Overall storage savings, including the indexes on the source system, resulted in about 93 percent, which represents a highly significant 14 times reduction.3 What do these results mean? Primarily, they demonstrate efficient use of storage on the server, significantly smaller data volume and hence performance benefits (see Figure 3) and considerable cost savings for backup media, and an enhanced end-user experience.

 
saraswatipura_fig3

Figure 3. Dramatically enhanced performance for integrated IBM Cognos BI reporting

 

Rapid data migration

The entire system was successfully migrated, integrated, and tested with the application in less than two days. The process was particularly seamless and required significantly reduced effort. The performance of the new system demonstrates that DB2 10.5 with BLU Acceleration is designed to provide optimal performance that helps deliver big data analytics solutions at the speed of thought.

What do you think about these results? Share any thoughts or questions in the comments area.

1,2,3 Results based on testing performed May 2013 under the guidance of IBM Hursley Lab — and Simon Woodcock — in the UK by Mohankumar Saraswatipura on a Reckitt Benckiser D5100 Data stage server.

Previous post

Three Data Categories Likely Missing in Your Data Warehouse

Next post

Data Warehouse Testing: Part 1

Mohankumar Saraswatipura

IBM Champion Mohankumar Saraswatipura started his career in DB2 8.1. He has worked as a database application developer and a performance consultant in High Performance On Demand Solutions (HiPODS) for IBM Software Labs, India. He is currently working as a Lead DB2 Database Administrator, helping application developers on Balanced Warehouse D5100, Siebel 8, and SAP ECC implementations. He is experienced in DB2 application performance tuning and database design. He is an IBM-certified application developer, IBM InfoSphere solution designer, XML solution developer, and advanced DB2 database administrator. Mohan is an active member of IDUG India. He completed his Master of Technology degree in computer science in 2004 and received his Executive MBA from IIMC in 2007.

  • Gerald

    Hi,
    when will BLU be finally availabe for Windows? We are waiting already for months…

  • Nishant Bhardwaj

    Hi Mohan,
    Thanks again for another wonderful article. While there are many articles on BLU technology on internet, I really like this one because it is crisp and concise, in form of cookbook. Now I understand why IBM states BLU technology as super easy and super fast!

  • Naveen Somayaji K N

    Hi Mohankumar, I’m Naveen from IBM software labs. There has been a change in functionality post beta for DB2 10.5 BLU where when you set DB2_ANALYTICS=WORKLOAD, it will not automatically set INTRA_PARALLEL (DBM) as YES. We have to manually set it to YES instead. Can you please modify the blog to avoid discrepancy in the document? You can refer DB2 10.5 Info center on latest DB2_WORKLOAD settings: http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.regvars.doc/doc/r0005658.html#r0005658__S_DB2_WORKLOAD

    • Mohan

      Thanks Naveen for letting me know, I will change this document to reflect the same.

  • Mohan

    Paul,

    The db2convert utility is to convert the table from ROW ORGANIZE to COLUMN ORGANIZE. The source database was ROW organized and now converting it to COLUMN organized to take the benefit of BLU acceleration for analytic workload.

  • Paul

    The numbered list order is 1, 1, 2, 3, 1 with no apparent reason for going back to 1 two times. Are these different processes or is the numbering just wrong and should be 1-5?

    It not clear why you would convert tables using the db2convert utility (3rd #1 in the list) when it seems that you have already created the row based tables (2nd #1 in the list) and loaded them (#2 in the list).

    • http://tdagroup.com Terry O’Donnell

      An HTML issue caused the numbering to be off. Should be fixed now. Thanks.