Big Data and Warehousing

Netezza Migration Kung-Fu: Part 2

Working within the continuum of discovery

Didn’t someone say that the road to migration would be a fairly easy one? But did they mean easy to implement, or easy to test? Actually, we could square away our implementation cycle very quickly, and accelerators can make this process even faster.1 However, you may remember from Part 1 of this article  that we are de-engineering the prior implementation into a new one—which does not detract from the simple fact that the majority of the work will involve testing, not development.

This may shock the project manager with a carefully laid-out implementation plan who may casually add a day or two of testing just to be safe. But beware the 80/20 rule here: you will spend 20 percent of your time in development and 80 percent in testing. And if you short-shrift testing, quality will suffer as well.

So let’s say you have two brand-new Netezza appliances—one for production and one for development. It’s important to put these high-powered machines to work as quickly as possible to start getting a return on the investment. But before installing them, consider the task ahead; you’re facing an enormous amount of one-time migration work and testing that will allow you to create the foundational data for all operational data to follow.

Imagine this scenario: at one site, you were promised a high-powered machine that could perform a migration in two months. Instead, you got a machine with only a quarter of the power because company policy required you to put that hardware inside the production enclosure, where it became unavailable to you.

Since Netezza scales linearly, your original two-month timeframe became eight months. Even if you scrambled to do eight months of work in five months, this wouldn’t impress clients who were promised a two-month project timeline. The client will realize later what you already knew; there’s absolutely no benefit in attempting to build, test, QA, and performance-tune the entire solution on a lower-powered machine while the higher-powered one sits behind smoked glass doing nothing.

But wait! The functional port is only one slice. What about the legacy data? You’ll need to backfill it to the new machine, too. It’s important to get an early handle on this process. That data is behind a physical and functional fortress, and nobody ever intended for it to be completely migrated to another system. Still, you can be assured that the surrounding infrastructure (network, disks, and so on) isn’t ready for its mass-egress. It’s fine to extract and load small tables, but for larger tables, plan to chop them into multiple files that each contain some portion of the table. Do not attempt to extract the data with “join” logic; the former system’s tables usually are not indexed to handle this method. Just dump the tables and get them to the Netezza appliance. Also, don’t try to offload the data as a single file or “pipe” it from the old to the new systems as a monolithic extract. You should always break apart time-protracted processes because it’s too expensive to keep restarting upon failure.

Migration projects are, by nature, a continuum of discovery. When the migration team in an organization looks into the existing system, the subject-matter experts will almost always find stuff they never knew about. This process comes with three high-level knowledge risks:

  1. What you know
  2. What you don’t know
  3. What you don’t know that you don’t know

You can mitigate the first two risks with research. The last one, however, might trap you. How much did you think you knew, but forgot? Those little time bombs of compromise that your IT staff created long ago with good intentions—well, they’re back.

Here are a couple of things that probably caught you by surprise:

  • The growing backlog of user requests (some even driving the migration)
  • Unspoken functional expectations that the new system will solve problems that the former system never could

Address these problems with interviews. Recognize them formally in a specification document. Deliver what you know users expect. Remember that specifications adapt under control, but not under a cloud of the unspoken.

You want the users to experience mind-blowing speed. But now you can stun them with rapid delivery, which gives you breathing room to deliver even more. How’s that for a twist in delivery protocol?

1 Netezza Data Integration Framework, Copyright © 2007-2012 Brightlight Consulting, All rights reserved.

Previous post

Is Your Big Data Hot, Warm, or Cold?

Next post

Delivering Trusted Information for Big Data and Data Warehousing

David Birmingham

IBM Champion David Birmingham is a senior principal with Brightlight Consulting, where he focuses on solutions using the IBM Netezza® appliance. David has two books on the subject: Netezza Underground and Netezza Transformation, available on Amazon.com, and he drives the best practices sessions at the Enzee Universe. He has more than 25 years of experience in very-large-scale solution deployment. Connect with David on IBM developerWorks through his profile, the Netezza Underground blog, or meet him in person at IBM Insight conferences in Las Vegas for the Enzee Best Practices sessions.