This month, IBM Data Management magazine is taking an in-depth look into IBM DB2 Analytics Accelerator and its origins. We recently spoke with DB2 Analytics Accelerator chief inventor Namik Hrle and lead DB2 innovator Guogen (Gene) Zhang.
NH: In the last decade, business analytics has gained significant importance, and that trend is accelerating. Successful companies rely on business analytics as their main source of competitive advantage. And when analytics workloads become mission-critical, they start to require the same reliability, availability, and security as the transactional workloads.
It’s common knowledge that System z with DB2 for z/OS provides these characteristics better than any other platform in the industry. System z also generates most of the transactional data, which is the key input to business analytics—but in many cases, the analytical workloads are not implemented on System z. There are several reasons for this. In the past, these workloads did not require the highest quality of service provided by System z. The analytical workloads are also very resource-intensive for CPU and memory, and the only way to execute them efficiently on general-purpose databases like DB2 is to spend lots of time and effort in monitoring and tuning. In some cases, not even the best skills and tools are enough to achieve appropriate performance. This led many customers to deploy specialized data warehouse databases—despite the lower quality of services and complexity associated with data movement, high levels of data redundancy, and having to manage multiple platforms.
Our objective was to create a solution that combines best of both worlds: DB2′s industry-leading reliability, availability, security, and transactional engine, and a special-purpose analytical appliance (which means no tuning!) that delivers lightning-fast performance for complex data and processing-intensive queries.
I am convinced that the best answer to this challenge is a hybrid system consisting of two engines that are completely transparent to the users from both the programming and administration perspective. A hybrid car is a good analogy. From the driver’s perspective, it’s a single car: one steering wheel, one accelerator pedal, one breaking pedal… but under the hood there are two engines that get activated transparently to the driver, based on various parameters that are monitored in real time.
NH: My team started working on this idea a few years ago. At that time, we decided to build our own analytical query engine from ground up. We took a prototype of an in-memory, columnar store query engine created by the research division at IBM and integrated it into DB2 in the hybrid fashion I described above. The product became generally available in November 2010 as the first implementation of a hybrid database.
Around the same time, IBM acquired Netezza and we suddenly got the opportunity to take advantage of a proven industry-leading analytical appliance with unbeatable price/performance. We changed the train, so to speak, but stayed on the same tracks. This was the birth of DB2 Analytics Accelerator. It brought the full power of Netezza to DB2 and elevated DB2 into a true universal database that incorporates the two most powerful database engines in the industry.
NH: The IBM Netezza 1000 has a revolutionary design based on principles that have allowed Netezza to provide the best price/performance in the market. Four key components make up Netezza 1000: SMP hosts, snippet blades (called S-Blades), disk enclosures, and a network fabric. These are carefully combined and preconfigured, which allows the queries to be executed extremely quickly.
S-Blades are intelligent processing nodes that make up the turbocharged massively parallel processing (MPP) engine of the appliance. Each S-Blade is an independent server that contains powerful multi-core CPUs, Netezza’s unique multi-engine Field Programmable Gate Arrays (FPGA), and gigabytes of RAM; these elements are all balanced and designed to work concurrently to deliver peak performance. FPGAs are commodity chips that are designed to process data streams at extremely fast rates. Netezza employs these chips to filter out extraneous data based on the SELECT and WHERE clauses in the SQL statement, as quickly as data can be streamed off the disk. The process of data filtering significantly reduces the amount of data, freeing up downstream components from unnecessary processing.
The S-Blades also execute an array of different database basic operations such as sorts, joins, and aggregations in the CPU cores. The CPU cores are designed with ample headroom to run embedded algorithms of arbitrary complexity against large data streams for advanced analytics applications.
GZ: By leveraging Netezza appliance technology and integrating it with DB2 for z/OS, we acquired a new query processing technology almost instantaneously—one with industry-leading performance for complex analytic workloads. And it’s available at a very affordable price point on a premium OLTP platform (System z) for our customers. Now DB2 users can take advantage of this new technology to analyze vast volumes of transactional data on the platform—transparently. DB2 analyzes and routes queries typically seen in business intelligence (BI) and data warehousing applications from the System z environment to a workload-optimized accelerator—i.e., a Netezza server.
DB2 Analytics Accelerator complements DB2′s traditional query processing and can speed up a substantial percentage of queries. It changes the equation in cost and performance, all transparently to the end user or applications. It is fast and easy to deploy, and it can be easily activated and deactivated with the setting or clearing off a software switch.
The key differentiation of DB2 Analytics Accelerator in relation to other offerings in the business intelligence and data warehousing space is its deep integration into DB2, which results in many advantages.
But before we go through that, let me quickly describe the basic DB2 processing structure. DB2 consists of multiple internal components that we call resource managers: Buffer Manager for data cache management, Internal Resource Lock Manager (IRLM) for locking, et cetera. These components are not visible to the outside world of applications and database administration, which connect to DB2 through the application and operational interfaces such as SQL and commands or utilities. The internal processing flow among the components is completely transparent to the outside world. So if we add another resource manager into DB2 and adhere to the existing interfaces, no application or database administration procedure needs to change. That’s exactly the design decision that we made with DB2 Analytics Accelerator.
For the outside world, the DB2 Analytics Accelerator is an integral part of DB2—one that is transparent, just like all other components that are behind standard DB2 APIs. Having the query accelerator within DB2 reduces the need for moving data outside of the enterprise warehouse database, which is common practice when creating data marts to accelerate specific segments of the database.
But it is not only about data and movement of data. It is also about optimization—where to run a particular type of query workload. The DB2 Analytics Accelerator solution is able to optimize a mixed query workload automatically, and to execute queries in the most efficient way.
GZ: Exactly. If the two technologies were not tightly integrated, developers or users would have to try to figure out which query to submit to which components (DB2 or the accelerator). With the deep integration, it’s not something that the end user has to worry about. The DB2 query processor analyzes each query and determines the fastest and most cost-effective way to execute each query. There is no need for manual intervention of any kind.
Query processing using the accelerator is like a new access path, similar to an index or Materialized Query Table (MQT). Once you have it installed and have data loaded into the accelerator, DB2 will utilize it for the queries, the best way it can. Most organizations have a database analyst who spends a great deal of time looking at query plans for their warehousing systems, trying to figure out whether to add an index to a table or rewrite the query. All of this inspection and work goes away. The SQL is moved out to the hardware accelerator so the end user and the database administrator don’t have to worry about it. You continue to use the same applications, and they have much faster query response time. And the accelerator frees up more System z CPU for transactional processing. The data retains its system of record in DB2 for z/OS and all of the security and operational attributes of DB2 data. The accelerator is fully managed via DB2 stored procedures, making it a truly plug-and-play appliance.
With the accelerator’s processing power, customers can now run queries that were forbidden and forgotten due to their high costs or overly long response time. Now they can create more analytic workloads to satisfy business needs. The potential is endless.
NH: Many people today talk about appliances, but this offering truly meets the definition. Our customers are typically able to run queries starting just days after the box arrives. There is no need for schema changes, for application adjustments, for data conversion. You attach DB2 Analytics Accelerator to DB2, specify tables to be accelerated, prime them in DB2 Analytics Accelerator, and your queries can take full advantage of the special-purpose data warehouse appliance.
NH: For some data warehouse applications, data is typically moved to other platforms to provide faster analysis. With the new DB2 Analytics Accelerator, there is no reason to do that and pay the price of a more complex landscape, separate systems, administration tasks, and so on. DB2 now provides a platform that enables organizations to blend both their operational data store and their data warehouse.
In a typical data warehouse environment, data is extracted from the operational systems, put through a transformation, and loaded into the data warehouse. That movement increases data latency. It also creates a lot of cost and overhead, and it makes the data vulnerable to possible intrusions. With System z, everything is kept on the same platform, and DB2 and SQL are used to move data and do the transformations; the data never leaves the platform. This helps to lower data latency, reduce costs, and make it a simpler overall environment.
The product has been well received by the users community and we have a significant number of sales and many more customers who are strongly considering how they utilize this game-changing technology.
IBM big data in a minute: Bringing the power of Hadoop to the enterprise
Video: The right tool for the job
Nature of analytics video: IBM and the swan of all fears
IBM redesigns its Big Data & Analytics website with IBM Watson Foundations capabilities
Visit a website with comprehensive resources dedicated to the chief data officer role
Podcast: Learn about the InfoSphere Streams project at GitHub