Data Architect: DB2 Data Warehouse Performance, Part 1
Data Architect: DB2 Data Warehouse Performance, Part 1
Translate OLTP-tuning skills into effective performance management for DB2-based business intelligence systems.
By Robert Catterall March/2009
Printer-Friendly Version
Email this Story
Bookmark to del.ico.us
Digg It!

Robert Catterall During a data warehousing presentation at the International DB2 Users Group conference in Warsaw last October, IBM's Willie Favero showed text from the 1983 announcement letter that introduced the DB2 database management system. That announcement positioned DB2 as an excellent foundation for decision-support applications (this was before the term "data warehouse" had been coined).

I well remember the DB2 launch, because I was working for IBM at the time. DB2 was indeed a great choice for business intelligence (BI) systems, but its programming productivity benefits — an English-like data manipulation language and DBMS-determined data access path selection — were equally appealing to IT people who were responsible for building and maintaining operationally focused applications. IBM responded to the voice of the customer by delivering a slew of performance enhancements over several DB2 releases that enabled dramatic increases in application throughput. DB2 — on Linux, Unix, and Windows (LUW) servers and on mainframes — became hugely popular as a data server for transaction processing workloads, with plenty of those being in the 1,000+ transactions-per-second category.

These days, interest in data warehousing is rising across industries as companies seek to extract actionable intelligence from their data assets to drive better decision-making. Businesses and government entities that have long used DB2 for operational applications are now building DB2-based BI systems. That development has some DB2 online transaction processing (OLTP) veterans wondering if they can be equally effective in supporting a data warehouse, particularly when it comes to performance management. This article is for those professionals. The subject is broad in scope, so I'll cover it in two installments.

Data Warehouse vs. OLTP Performance Management: Apples and Oranges

Performance management in a DB2 data warehouse environment isn't the same as monitoring and tuning a DB2-based OLTP application. Here are some of the key differences:

Individual SQL statements vs. transactions. In an OLTP environment, the focus is on transactions that typically contain multiple SQL statements and are often expected to complete in less than a second. For a BI application, a "transaction" (a user interaction with the system) may involve the execution of just one SQL statement, and that statement may run for several minutes or even an hour or more without being thought of as "slow." A user might be very happy to get a report in an hour if it once took 10 hours to run.

Facts and dimensions. A DB2 database used for OLTP work will likely have a traditional third-normal-form design (or something close to that). Data warehouse database designs, on the other hand, are often dimensional in nature, with sets of related tables arranged in "star schemas" (a central "fact" table and associated dimension tables).

Continuous vs. overnight database updates. For an OLTP application, database updates tend to occur around the clock, seven days a week. On the BI side, despite growing interest in near real-time updating of database values, a data warehouse database is typically updated at night, often via massive extract, transform, load (ETL) runs. Query access is typically unavailable during ETL processing, making timely completion of the database update process an imperative.

Small vs. large result sets. SELECT statements in OLTP transaction programs typically retrieve just a few database rows (often only one or two). Data warehouse queries — particularly those used to generate reports or online analytic processing cubes — may return hundreds of thousands (or even millions) of rows.

Complex vs. simple queries. SELECT statements in OLTP transaction programs are often quite simple: one or two tables accessed, little or no dynamic table-building, and little or no on-the-fly transformation of data values or types. Queries associated with BI applications might be several pages long, with joins of a dozen or more tables, nested or common table expressions, recursive SQL, data value-changing CASE expressions, and data-type transformation via CAST specifications or scalar functions.

SQL you have to deal with vs. SQL you wrote (or at least reviewed). In a data warehouse environment, SQL is often generated by reporting or OLAP tools, with no opportunity for you to change it before it's executed. It'll be up to you to set up a DB2 environment in which such queries can run well.

Basically, a DB2 professional helps to deliver good data warehouse performance by getting two things right:

  • Setting up the DB2 environment so as to give queries the best chance of running well
  • Effectively tuning queries that are running too long (in spite of your having done a good job with the first task).

This article focuses on setting up the DB2 environment. I'll deal with the tuning of data warehouse SQL statements in the next issue.

Getting the DB2 Environment Right

Having a DB2 data warehouse environment that promotes good query performance involves both system-level and database-level actions. With respect to the DB2 system, pay attention to the following:

Leverage 64-bit addressing. Big DB2 buffer pools are always helpful, but they are especially useful for I/O-intensive data warehouse workloads. Many experienced DB2 people who are accustomed to working within the confines of a 2GB (mainframe) or 4GB (Linux/Unix/Windows) memory space have been a little slow to get with the 64-bit program.

Server memory sizes are humongous these days: You can get a terabyte or more of system memory on an IBM System z mainframe, System p server (AIX or Linux), or System x server (Windows or Linux). DB2 on any of those platforms will support a buffer-pool configuration size of at least a terabyte.

If you have a server with a large memory resource, you need to think big in terms of DB2 buffer pools. I've seen DB2 running on a server with 40GB of system memory with an 800MB buffer pool configuration. That's way too small: 10-20GB (at least) would be more appropriate in that case.

Next Page >>


Comments? Questions?

Give us your feedback or ask a question of the author.

Please enter your e-mail address below:

CAREER CENTER
Ready to take that job and shove it?
SEARCH JOBS
RECENT JOB POSTINGS
CAREER NEWS
DoD Recognizes University Scientists For Basic Research
Annual awards to university faculty to conduct next-generation research projects were announced this week by the Defense Department.

Sponsored links:





Visit these other IBM and TechWeb Partner Sites: :
Maximizing ROI Through Business Process Management (BPM) and Service-Oriented Architecture (SOA)
Internet Evolution – The Macrosite for News, Analysis, & Opinion About the Future of the Internet
Business Innovation – Technology Strategies and Solutions for Driving Business Success