<?xml version="1.0" encoding="iso-8859-1"?>

<rss version="2.0"
 xmlns:dc="http://purl.org/dc/elements/1.1/"
 xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
 xmlns:admin="http://webns.net/mvcb/"
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns:content="http://purl.org/rss/1.0/modules/content/">




	



		
				
						
					
                                
			
	


	





 



























<channel>

<title>IBM Database Magazine</title>
<link>http://ibmdatabasemag.com</link>
<description></description>
<language>en-us</language>
<copyright>Copyright 2006, CMP Media.</copyright>




		<item>
			<title><![CDATA[Taming a Terabyte of XML Data]]></title>
			<link><![CDATA[http://ibmdatabasemag.com/story/showArticle.jhtml?articleID=216500898&cid=RSSfeed]]></link>
			<description><![CDATA[Enterprises today are struggling to manage the increasing volume of XML data they generate or consume. Intel and IBM have executed the industry's first terabyte XML database benchmark, based on a financial application scenario, which shows the feasibility of managing high volumes of XML data on cost-effective hardware.]]></description>
			<pubDate>Wed, 15 Apr 2009 17:00:00 EDT</pubDate>
			<keywords><![CDATA[DB2 9 XML Benchmark, pureXML, TPoX]]></keywords>
			<blurb><![CDATA[Enterprises today are struggling to manage the increasing volume of XML data they generate or consume. Intel and IBM have executed the industry's first terabyte XML database benchmark, based on a financial application scenario, which shows the feasibility of managing high volumes of XML data on cost-effective hardware.]]></blurb>
			<authors><![CDATA[ Agustin Gonzalez, Intel Corporation and Matthias Nicola, IBM Silicon Valley Lab]]></authors>
			<body><![CDATA[
			
					
<P>
Undeniably, XML has emerged as the de-facto standard for data  exchange, service-oriented architectures (SOAs), and message-based transaction  processing. As companies accumulate increasing amounts of XML data, they  require more than message processing technology that handles one XML document  at a time. Companies have started to persist large volumes of XML documents,  sometimes due to regulatory requirements and sometimes because of XML's  flexibility or the well-known difficulties associated with converting XML data  to relational format and back. Accustomed to the benefits of mature relational  databases, companies expect the same capabilities for XML data (the ability to  persist, query, index,  update, and validate XML data with full ACID compliance, recoverability, high  availability, and high performance).</p>
<P>

XML is sometimes considered a  verbose and slow data format, especially in terms of processing large numbers  of documents. This view of XML is often based on past experience with  insufficient technology. For example, storing many XML documents in a file  system and writing application code to parse and analyze them easily leads to  poor performance and disappointment. This scenario no longer applies with  state-of-the-art database and processor technology. </p>
<P>

Using DB2's pureXML capabilities and Intel multi-core  CPUs, Intel and IBM executed the industry's first terabyte  XML database benchmark to demonstrate that high-end transaction  processing over a terabyte of XML data is no longer wishful thinking. This  article describes these performance tests, the hardware used, the DB2  configuration, and the results and lessons learned. Various DB2 technologies  proved to be of critical importance, including deep compression, automatic  storage, self-tuning memory, and, of course, pureXML. The results quantify  DB2&rsquo;s multi-user scalability with Linux on Intel quad-core and six-core  processors (Intel Xeon Processor 7300 and 7400 Series). All system  configurations and tests reported in this article were performed by Intel at  Intel Labs.</p>
<h3>DB2 pureXML </h3>
<P>

DB2 pureXML provides support for  XML data management such as XML storage, XML indexing, XML queries and updates, and  optional document validation with XML Schemas. Users can define columns of type  XML in which they can store one XML document per row. Tables can contain a mix  of XML and relational columns which makes the integration of XML and relational  data easy. When XML documents are inserted or loaded into an XML column, they  are parsed and stored in a parsed tree format. This allows queries and updates  to operate on XML data without XML parsing &mdash; a key performance benefit. XML indexes can be defined  on specific elements or attribute to ensure high query performance. Queries and  updates are based on the SQL/XML and XQuery standards and can access both XML  and relational in a single statement if needed.</p>
<h3>The TPoX Benchmark</h3>
<P>

To prove high-end XML performance  we have chosen to execute the TPoX benchmark. <a href="http://tpox.sourceforge.net/" target="_blank">TPoX (Transaction Processing over XML)</a> is an open-source  and application-level XML database benchmark based on a financial application  scenario. It evaluates the performance of XML database systems, focusing on  XQuery, SQL/XML, XML storage and indexing, XML Schema support, XML inserts,  updates and deletes, logging, concurrency and other database aspects. TPoX  simulates a security trading scenario and uses a real-world XML Schema (<a href="http://www.fixprotocol.org/specifications/fix4.4fixml" target="_blank">FIXML</a>) to  model some of its data. TPoX is designed to exercise a realistic and  representative set of XML operations.</p>
<P>

The main logical  data entities in TPoX are:</p>
<ul>
  <li><strong>Customer:</strong> A  single customer can have one or multiple accounts</li>
  <li><strong>Account:</strong> Each account contains one or multiple holdings</li>
  <li><strong>Holding:</strong> A  number of shares of a security</li>
  <li><strong>Security:</strong> A  stock, bond, or mutual fund</li>
  <li><strong>Order:</strong> Each  order buys or sells shares of exactly one security for exactly one account</li>
</ul>
<P>

For each  customer, there is an XML document that contains all personal information,  account information, and holding information for that customer (Figure 1). Each order is represented by an XML message that  complies with the FIXML 4.4 schema. FIXML is a complex industry-standard XML  schema for trade-related messages such as buy or sell orders. Each security is described  by a single XML document. The collection of 20,833 security documents  represents the majority of US-traded stocks, bonds, and mutual funds. While the  number of security documents is fixed, the benchmark scenario scales in the  number of <code>custacc</code> and <code>order</code> document. The 1TB TPoX database uses 300,000,000 <code>order</code> and 60,000,000 <code>custacc</code> documents. </p>
<P>

<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/clip_image001.gif" alt="" width="325" height="174" border="0"><br>
    <strong>Figure 1. </strong>TPoX data entities. </p>
<P>

The TPoX workload  consists of 17 transactions, listed in Table  1. Their relative weight in the transaction mix is  shown in the rightmost column. Insert, update, and delete operations amount to 30 percent of  the workload; queries to 70 percent of the workload. XML Schema validation is  performed in transaction I2, U2, and U4.</p>
<table border="1" cellspacing="0" cellpadding="0">
  <tr>
    <td width="43" valign="top">
<P>
<strong>#</strong></p></td>
    <td width="414" valign="top">
<P>
<strong>Transaction</strong></p></td>
    <td width="60" valign="top"><p align="center"><strong>Weight</strong></p></td>
  </tr>
  <tr>
    <td width="43" valign="top">
<P>
<strong>I1</strong></p></td>
    <td width="414" valign="top">
<P>
Customer    places a new order (insert order document)</p></td>
    <td width="60" valign="top"><p align="center">7%</p></td>
  </tr>
  <tr>
    <td width="43" valign="top">
<P>
<strong>I2</strong></p></td>
    <td width="414" valign="top">
<P>
Add a new    customer (insert <code>CustAcc</code> document)</p></td>
    <td width="60" valign="top"><p align="center">1%</p></td>
  </tr>
  <tr>
    <td width="43" valign="top">
<P>
<strong>D1</strong></p></td>
    <td width="414" valign="top">
<P>
An order is    cancelled or archived (delete order doc)</p></td>
    <td width="60" valign="top"><p align="center">7%</p></td>
  </tr>
  <tr>
    <td width="43" valign="top">
<P>
<strong>D2</strong></p></td>
    <td width="414" valign="top">
<P>
Remove a    customer (delete <code>CustAcc</code> document)</p></td>
    <td width="60" valign="top"><p align="center">1%</p></td>
  </tr>
  <tr>
    <td width="43" valign="top">
<P>
<strong>U1</strong></p></td>
    <td width="414" valign="top">
<P>
Close an    existing customer&rsquo;s account</p></td>
    <td width="60" valign="top"><p align="center">1%</p></td>
  </tr>
  <tr>
    <td width="43" valign="top">
<P>
<strong>U2</strong></p></td>
    <td width="414" valign="top">
<P>
Open a new    account for an existing customer</p></td>
    <td width="60" valign="top"><p align="center">1%</p></td>
  </tr>
  <tr>
    <td width="43" valign="top">
<P>
<strong>U3</strong></p></td>
    <td width="414" valign="top">
<P>
Update the    price of a security</p></td>
    <td width="60" valign="top"><p align="center">3%</p></td>
  </tr>
  <tr>
    <td width="43" valign="top">
<P>
<strong>U4</strong></p></td>
    <td width="414" valign="top">
<P>
Update the    status of an order </p></td>
    <td width="60" valign="top"><p align="center">3%</p></td>
  </tr>
  <tr>
    <td width="43" valign="top">
<P>
<strong>U5</strong></p></td>
    <td width="414" valign="top">
<P>
Execute a    &ldquo;buy&rdquo; order of a given security &amp; account: <br>
      1. If shares    already exist, increase the quantity;&nbsp; <br>
      otherwise, add    a new holding<br>
      2. Update    account balance and date value <br>
      3. Abort if    the max. number of holdings is exceeded</p></td>
    <td width="60" valign="top"><p align="center">3%</p></td>
  </tr>
  <tr>
    <td width="43" valign="top">
<P>
<strong>U6</strong></p></td>
    <td width="414" valign="top">
<P>
Execute a &ldquo;sell&rdquo;    order (opposite of U5)</p></td>
    <td width="60" valign="top"><p align="center">3%</p></td>
  </tr>
  <tr>
    <td width="43" valign="top">
<P>
<strong>Q1</strong></p></td>
    <td width="414" valign="top">
<P>
Retrieve an    order for a given order id</p></td>
    <td width="60" valign="top"><p align="center">10%</p></td>
  </tr>
  <tr>
    <td width="43" valign="top">
<P>
<strong>Q2</strong></p></td>
    <td width="414" valign="top">
<P>
Retrieve a    security for a given ticker symbol</p></td>
    <td width="60" valign="top"><p align="center">10%</p></td>
  </tr>
  <tr>
    <td width="43" valign="top">
<P>
<strong>Q3</strong></p></td>
    <td width="414" valign="top">
<P>
Get a    customer&rsquo;s personal data, construct profile document</p></td>
    <td width="60" valign="top"><p align="center">10%</p></td>
  </tr>
  <tr>
    <td width="43" valign="top">
<P>
<strong>Q4</strong></p></td>
    <td width="414" valign="top">
<P>
Search    securities based on 4 predicates and return specific elements of interest</p></td>
    <td width="60" valign="top"><p align="center">10%</p></td>
  </tr>
  <tr>
    <td width="43" valign="top">
<P>
<strong>Q5</strong></p></td>
    <td width="414" valign="top">
<P>
Construct an    account summary and statement</p></td>
    <td width="60" valign="top"><p align="center">10%</p></td>
  </tr>
  <tr>
    <td width="43" valign="top">
<P>
<strong>Q6</strong></p></td>
    <td width="414" valign="top">
<P>
Retrieve the    price of a certain security</p></td>
    <td width="60" valign="top"><p align="center">10%</p></td>
  </tr>
  <tr>
    <td width="43" valign="top">
<P>
<strong>Q7</strong></p></td>
    <td width="414" valign="top">
<P>
Get a    customer&rsquo;s most expensive order</p></td>
    <td width="60" valign="top"><p align="center">10%</p></td>
  </tr>
</table>
<P>

<strong>Table 1.</strong> Business descriptions of TPoX transactions. </p>
<P>

The workload is executed  by a Java workload driver that spawns a configurable number of concurrent  threads to simulate concurrent users. Each thread connects to the database and  executes a stream of transactions without think times. When a transaction  commits, the thread that submitted the transaction immediately picks another  transaction from Table  1, randomly but with skewed probabilities based on the  transaction weights. At run time, the workload driver replaces parameter  markers in the transactions with concrete values drawn from random  distributions. The Java code of the workload driver is available as <a href="https://sourceforge.net/project/showfiles.php?group_id=185925&package_id=216664" target="_blank">open  source</a> and can be used for many types of database tests &ndash; not just the TPoX  benchmark. </p>
<h3>The  System Under Test</h3>
<P>

The test system (Figure  2) consists of the following hardware and software components:</p>
<ul type="disc">
  <li>Database       server</li>
  <ul type="circle">
    <li>Intel Xeon Processor 7400 Server</li>
    <li>4 CPUs, six cores per CPU, 2.67Ghz, 16 MB L3 cache        per CPU</li>
    <li>64GB of main memory</li>
  </ul>
  <li>Client machine: Intel Xeon Processor 5400 Server</li>
  <li>Operating System for client and server: Linux SLES 10,       64bit</li>
  <li>Database software: DB2 9.5 for Linux, UNIX, and       Windows, Fixpack 2</li>
  <li>Client software: TPoX open source workload driver,       Java 1.5</li>
  <li>Storage</li>
  <ul type="circle">
    <li>EMC CX3-80</li>
    <li>15 disks per LUN (RAID 0)</li>
    <li>120 disks (8 LUNs) for the database</li>
    <li>15 disks (1 LUN) for the log</li>
    <li>30 disks (2 LUNs) for the raw data</li>
    <li>2 RAID controller cards (one for the flat files, one for        database and log)</li>
    <li>2 fiber channel connections (4GB/s) per controller card</li>
  </ul>
</ul>
<P>

<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/clip_image003.gif" alt="Figure 2. System under test." width="497" height="287" border="0"><br>
    <strong>Figure 2.</strong> System under test.</p>
<h3>Intel  Xeon Processor 7400 and 7300 Series</h3>
<P>

To analyze how DB2 performance  scales with an increasing number of cores per CPU, we ran the benchmark two  times with different processors. The first set of tests used four Intel Xeon 7400 Series CPUs, which have six cores each. Then we repeated  the benchmark using four Intel Xeon 7300 CPUs, which have four cores each. The  comparison of CPUs in Table 2 shows that they differ in more than just the number  of cores. While the Intel Xeon Processor 7400 Series has 50 percent more cores  than the 7300 Series, its clock speed is 10 percent lower, but it has a 16MB L3  cache which the Intel Xeon 7300 Series does not have.</p>
<table border="1" cellspacing="0" cellpadding="0">
  <tr>
    <td width="145" valign="top">
<P>
<strong>Processor</strong></p></td>
    <td width="58" valign="top">
<P>
<strong>Cores</strong></p></td>
    <td width="93" valign="top">
<P>
<strong>Frequency</strong></p></td>
    <td width="77" valign="top">
<P>
<strong>L2 cache</strong></p></td>
    <td width="89" valign="top">
<P>
<strong>L3 cache</strong></p></td>
    <td width="96" valign="top">
<P>
<strong>Technology</strong></p></td>
    <td width="78" valign="top">
<P>
<strong>Watts</strong><strong> </strong></p></td>
  </tr>
  <tr>
    <td width="145" valign="top">
<P>
Intel X7460<br>
      (Xeon 7400 Series)</p></td>
    <td width="58" valign="top">
<P>
6</p></td>
    <td width="93" valign="top">
<P>
2.66 GHZ</p></td>
    <td width="77" valign="top">
<P>
3 x 3 MB</p></td>
    <td width="89" valign="top">
<P>
16 MB</p></td>
    <td width="96" valign="top">
<P>
45nm</p></td>
    <td width="78" valign="top">
<P>
130 W</p></td>
  </tr>
  <tr>
    <td width="145" valign="top">
<P>
Intel X7350<br>
      (Xeon 7300 Series)</p></td>
    <td width="58" valign="top">
<P>
4</p></td>
    <td width="93" valign="top">
<P>
2.93 GHZ</p></td>
    <td width="77" valign="top">
<P>
2 x 4 MB</p></td>
    <td width="89" valign="top">
<P>
None</p></td>
    <td width="96" valign="top">
<P>
65nm</p></td>
    <td width="78" valign="top">
<P>
130 W</p></td>
  </tr>
</table>
<P>

<strong>Table 2. </strong>Intel Xeon processors used  in this benchmark. </p>
<P>

When we switched the CPUs, all  other details of the hardware  and software remained identical. Both the Xeon 7400 and 7300 processor series  use the same chipset so that replacing one with the other is just a &ldquo;drop in&rdquo;  processor replacement with no other changes required.</p>
<h3>DB2 Configuration and Tuning</h3>
<P>

The DB2 database was created with  DB2's automatic storage feature and a page size of  16KB, using eight logical volumes plus a separate volume for the log. The database schema that we chose to implement the TPoX  scenario is very simple. It consists of three XML columns in three tables, one  for each of the three XML document types in TPoX (<code>order</code>, <code>custacc</code>, <code>security</code>):</p>
<pre>create table custacc (cadoc xml inline length 16288) <br>in custacc_tbs index in custacc_idx_tbs compress yes;
</pre>
<pre>create table order (odoc xml inline length 16288) <br>in orders_tbs index in orders_idx_tbs compress yes;
</pre>
<pre>create table security (sdoc xml inline length 16288) <br>in security_tbs index in security_tbs compress yes;</pre>
<P>

XML inlining and compression was used to reduce the storage  footprint for the 1TB of raw XML data. We created five table spaces (one table  space for each of the three tables plus one table space for <code>custacc</code> indexes and one for <code>order</code> indexes). All table  spaces were configured with <code>NO FILE SYSTEM CACHING</code> and <code>AUTOMATIC STORAGE</code>.  Each table space had its own buffer pool, plus one buffer pool for the  temporary table space. We used the different table spaces and buffer pools  mainly for ease of monitoring. We later confirmed that combining all tables and  all indexes  into a single table space and a single large buffer pool produced almost the  same performance (only 6 percent lower throughput than with the manual configuration). </p>
<P>

For the  configuration of buffer pool sizes, sort heap, lock list, package cache, num_iocleaners,  num_ioservers, and so on, we took the following approach. To avoid lengthy and  repetitive tuning iterations we simply <em>guessed</em> reasonable values for all  these parameters. The numbers we chose were not intended or known to be optimal  for this workload. They were only meant to be a starting point for DB2's  self-tuning behavior. For example, we knew that we needed large buffer pools  for the <code>custacc</code> and <code>order</code> tables, but we didn't  know what sizes would be optimal. We decided to let DB2's self-tuning memory  manager (STMM) figure out the optimal size. To help the STMM converge quickly  towards the optimal buffer pool sizes we didn&rsquo;t want to start with the default  of 1,000 pages, which we knew was way too small. For example, first we set the  buffer pool for the <code>custacc</code> table to 770,000 pages and <em>then</em> to <code>automatic</code>, so that fewer  iterative STMM adjustments would be needed to reach the optimal size than if we  had started with 1,000 pages. The parameters <code>INSTANCE_MEMORY</code> and <code>DATABASE_MEMORY</code> were also set to automatic.</p>
<ul>
  <li>You can see our complete <a href="http://tpox.svn.sourceforge.net/viewvc/tpox/TPoX/DB2/ddl/createtpox_1TB.sql" target="_blank">DDL  script</a> in the TPoX open-source repository at <a href="http://tpox.svn.sourceforge.net/viewvc/tpox/TPoX/DB2/ddl/createtpox_1TB.sql" target="_blank">http://tpox.svn.sourceforge.net/viewvc/tpox/TPoX/DB2/ddl/createtpox_1TB.sql</a>.</li>
  <li>Performance  Results</li>
</ul>
<P>

Now  let's look at following types of results:</p>
<ul type="disc">
  <li>Storage consumption and compression</li>
  <li>Transaction throughput of the mixed workload (70       percent queries, 30 percent insert/update/delete)</li>
  <li>Buffer pool hit ratios</li>
  <li>Out-of-the-box performance with minimal manual       configuration or tuning</li>
  <li>Scalability from 4-core to 6-core CPUs</li>
  <li>Incremental insert performance</li>
</ul>
				
					<h3>Storage  Consumption and Compression Results</h3>
<P>

Since the <code>security</code> table is  very small (20,833 documents) we examined the space consumption and compression  ratio mainly for the two large tables, <code>custacc</code> and <code>order</code> (see Table 3). The 60 million <code>custacc</code> documents are compressed  by 64 percent and require 121.4GB in a DB2 table space. The 300 million <code>order</code> documents are compressed by 57 percent and occupy 269.2GB in DB2. Including all  data and indexes,  the final database size was about 440GB. XML inlining  and compression were critical to avoid I/O bottlenecks.</p>
<table border="1" cellspacing="0" cellpadding="0">
  <tr>
    <td width="91" valign="top">
<P>
<strong>Table</strong></p></td>
    <td width="109" valign="top">
<P>
<strong>No of XML documents</strong></p></td>
    <td width="103" valign="top">
<P>
<strong>No. of 16kb</strong><br>
            <strong>pages used</strong></p></td>
    <td width="88" valign="top">
<P>
<strong>Size (GB)</strong></p></td>
    <td width="103" valign="top">
<P>
<strong>Compression ratio</strong></p></td>
    <td width="158" valign="top">
<P>
<strong>Size of </strong><br>
            <strong>Indexes</strong></p></td>
  </tr>
  <tr>
    <td width="91" valign="top">
<P>
custacc</p></td>
    <td width="109" valign="top">
<P>
60,000,000</p></td>
    <td width="103" valign="top">
<P>
7,959,808 </p></td>
    <td width="88" valign="top">
<P>
121.4GB</p></td>
    <td width="103" valign="top">
<P>
64%</p></td>
    <td width="158" valign="top">
<P>
10.3 GB</p></td>
  </tr>
  <tr>
    <td width="91" valign="top">
<P>
order</p></td>
    <td width="109" valign="top">
<P>
300,000,000</p></td>
    <td width="103" valign="top">
<P>
17,643,104</p></td>
    <td width="88" valign="top">
<P>
269.2 GB</p></td>
    <td width="103" valign="top">
<P>
57%</p></td>
    <td width="158" valign="top">
<P>
39.3 GB</p></td>
  </tr>
</table>
<P>

<strong>Table 3.</strong> Space consumption and compression.</p>
<h3>Transaction Throughput of a Mixed  Workload</h3>
<P>

Figure  3 shows the transaction throughput result for the mixed  workload on the 24 core, 2.66Ghz, Intel Xeon 7400 platform. The horizontal axis  shows the different number of concurrent users that the TPoX workloads driver simulated.  Each user issues a stream of transactions without think time between  transactions. The blue curve represents the transactions per seconds and  belongs to the vertical axis on the left. The pink curve indicates the CPU utilization and belongs to  the vertical axis on the right. The throughput and CPU utilization grow as the  number of concurrent users is increased. When the number of users is increased  from 100 to 150 and 200, the CPU utilization approaches the maximum capacity of  the system and consequently the throughput flattens out. At 200 users the maximum  throughput is 6763 TPoX transactions per second. </p>
<P>

<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/clip_image005.gif" alt="Figure 3 . Transactions per seconds and CPU utilization." width="479" height="339" border="0"><br>
    <strong>Figure 3 . </strong>Transactions per seconds and CPU utilization.</p>
<P>

Increasing the number of users  beyond 200 did not lead to higher throughput, only to longer transaction  response times. The flattening  throughput curve and the exhaustion of the system capacity at 200 users is  directly related to the fact that all simulated users submit transactions  without think time between one transaction and the next. If each user submitted,  for example, one transaction per second, then the system could support  thousands of users. </p>
<P>

Figure  4 shows the output of the workload driver for the mixed  workload with 200 users and a test duration of 2 hours. The detailed statistics  for all 17 transactions in the workload mix include their maximum and average  response times as well as their &quot;count,&quot; which is the number of times  each transaction was executed across all 200 users. 48.5 million transactions  were executed in the two-hour test. All average transaction response times are  less than 0.1 seconds. Since the workload driver was run on a separate client  machine, the response times include the network round trip time.</p>
<P>
<div style="border:solid 1px #000;">
  
<P>
*** WORKLOAD STATISTICS ***</p>
  
<P>
<strong>Tr.# &nbsp;&nbsp;Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Type&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;Count&nbsp;&nbsp;&nbsp;  &nbsp;&nbsp;%-age&nbsp; Total Time(s) &nbsp;&nbsp;&nbsp;Min Time(s)  &nbsp;&nbsp;&nbsp;Max Time(s)&nbsp;&nbsp;&nbsp; Avg Time(s)</strong> <br>
    <code>1&nbsp;&nbsp;&nbsp;  Get_order&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Q&nbsp;&nbsp;&nbsp; 4859631&nbsp;  10.00&nbsp; 84490.77&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.37&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.02 <br>
    2&nbsp;&nbsp;&nbsp;  Get_security&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Q&nbsp;&nbsp;&nbsp; 4855112&nbsp;  9.99&nbsp;&nbsp; 31999.10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.21&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.01 <br>
    3&nbsp;&nbsp;&nbsp;  Customer_profile&nbsp;&nbsp; Q&nbsp;&nbsp;&nbsp; 4863296&nbsp;  10.01&nbsp; 79068.06&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;0.00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.21&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.02 <br>
    4&nbsp;&nbsp;&nbsp;  Search_securities&nbsp; Q&nbsp;&nbsp;&nbsp; 4861991&nbsp;  10.01&nbsp; 286924.50&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.54&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.06 <br>
    5&nbsp;&nbsp;&nbsp;  Account_summary&nbsp;&nbsp;&nbsp; Q&nbsp;&nbsp;&nbsp; 4855457&nbsp;  9.99&nbsp;&nbsp; 86128.72&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.36&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.02 <br>
    6&nbsp;&nbsp;&nbsp;  Get_security_price Q&nbsp;&nbsp;&nbsp; 4859441&nbsp; 10.00&nbsp;  30378.65&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.15&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.01 <br>
    7&nbsp;&nbsp;&nbsp;  Customer_max_order Q&nbsp;&nbsp;&nbsp;  4856963&nbsp; 9.99&nbsp;&nbsp; 253992.34&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.26&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.05 <br>
    8&nbsp;&nbsp;&nbsp;  U1CloseAccount&nbsp;&nbsp;&nbsp;&nbsp; U&nbsp;&nbsp;&nbsp; 485654&nbsp;&nbsp;  1.00&nbsp;&nbsp; 15431.24&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1.68&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.03 <br>
    9&nbsp;&nbsp;&nbsp; U2OpenAccount&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; U&nbsp;&nbsp;&nbsp;  486821&nbsp;&nbsp; 1.00&nbsp;&nbsp; 31283.13&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1.94&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.06 <br>
    10&nbsp;&nbsp; U3SecurityPrice&nbsp;&nbsp;&nbsp; U&nbsp;&nbsp;&nbsp;  1458598&nbsp; 3.00&nbsp;&nbsp; 33801.16&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.21&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.02 <br>
    11&nbsp;&nbsp; U4OrderStatus&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; U&nbsp;&nbsp;&nbsp;  1460055&nbsp; 3.00&nbsp;&nbsp; 61331.56&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.58&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.04 <br>
    12&nbsp;&nbsp;  U5BuySecurity&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; U&nbsp;&nbsp;&nbsp; 1457954&nbsp;  3.00&nbsp;&nbsp; 55542.71&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1.83&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.04 <br>
    13&nbsp;&nbsp;  U6SellSecurity&nbsp;&nbsp;&nbsp;&nbsp; U&nbsp;&nbsp;&nbsp; 1457241&nbsp;  3.00&nbsp;&nbsp; 54253.77&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1.69&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.04 <br>
    14&nbsp;&nbsp;  delcustacc&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; D&nbsp;&nbsp;&nbsp; 485762&nbsp;&nbsp;  1.00&nbsp;&nbsp; 14893.65&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.00&nbsp;&nbsp;  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;0.21&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.03 <br>
    15&nbsp;&nbsp;  delorder&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; D&nbsp;&nbsp;&nbsp; 3400792&nbsp;  7.00&nbsp;&nbsp; 105141.35&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.69&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.03 <br>
    16&nbsp;&nbsp;  insValidcustacc&nbsp;&nbsp;&nbsp; I&nbsp;&nbsp;&nbsp; 487083&nbsp;&nbsp;  1.00&nbsp;&nbsp; 14013.69&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1.87&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.03 <br>
    17&nbsp;&nbsp;  insNoValidorder&nbsp;&nbsp;&nbsp; I&nbsp;&nbsp;&nbsp; 3403245&nbsp;  7.00&nbsp;&nbsp; 67252.42&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.93&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.02 <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 48,595,096 </code></p>
  
<P>
The throughput is 405804 transactions per minute  (6763.42 per second).</p>
</div>
<P>

<strong>Figure 4.</strong> Detailed transaction results at 200 users. </p>
<P>

Optionally,  the workload driver can print such a transaction summary every <em>n</em> minutes  during the test period. This allowed us to confirm that the throughput is stable  the entire time. The workload driver can also calculate the 90th, 95th,  or 99th percentile of the transaction response times. Percentiles  are useful if you want to confirm that 90 percent, 95 percent, or 99 percent of  the transaction response times are below a certain threshold.</p>
<P>

Remember  that the workload driver is freely available as open source and can be used to  run any kind of SQL, SQL/XML, or XQuery workload that you define. It's a very  versatile tool for all sorts of database performance testing. </p>
<h3>Buffer Pool Performance under &ldquo;Self  Tuning Memory Management&rdquo;</h3>
<P>

Adjusted by DB2's self-tuning  memory manager, the combined size of all buffer pools reached 46 GB (out of  64GB physically memory). Since the database size after compression was about  440GB, the ratio between buffer pools and database size is 10.5 percent  (46GB/440GB). Figure 5 shows that the buffer pools for <code>custacc</code> and <code>order</code> indexes had a hit ratios between 95 and 100 percent. The hit ratio for the <code>custacc</code> and order tables was between 60  percent and 70 percent. Without DB2's compression, this hit ratio would have  been lower and performance would have been worse.</p>
<P>

<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/clip_image007.gif" alt="Figure 5. Buffer pool hit ratios. " width="313" height="216" border="0"><img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/clip_image009.gif" alt="Figure 5. Buffer pool hit ratios. " width="313" height="216" border="0"> <br>
    <strong>Figure 5.</strong> Buffer pool hit ratios. </p>
<h3>Out-of-the-Box DB2 Performance</h3>
<P>

How  difficult is it to tune DB2 for the performance that we achieved in this test?  For example, is it really necessary to have five separate table spaces and  buffer pools for the different tables and indexes? Can we achieve similar  performance with a database set up that is much simpler than the <a href="http://tpox.svn.sourceforge.net/viewvc/tpox/TPoX/DB2/ddl/createtpox_1TB.sql" target="_blank">DDL  script</a> that we used initially?</p>
<P>

In an  attempt to answer these questions, we repeated the benchmark and configured the  DB2 database with just four simple steps:</p>
<ul type="disc">
  <li>Create the database with automatic storage</li>
  <li>Change the log location to a separate storage path</li>
  <li>Create a separate buffer pool for the temporary       table space</li>
  <li>Use DB2's <code>AUTOCONFIGURE</code> command       to let DB2 tune itself.</li>
</ul>
<P>

These  steps are shown in Figure 6. Note that this database uses just a single default table  space and a single default buffer pool for all tables and indexes. With this setup, the mixed workload  with 200 users reached 6368 transactions per second, which is only 6 percent  lower than the 6763 TPS that we achieved with the more detailed database  configuration. This result shows that high-end performance does not always  require expert database tuning and that DB2's autonomic capabilities work remarkably  well.</p>
<div style="border:solid 1px #000;">
  
<P>
CREATE DATABASE tpox <br>
    AUTOMATIC STORAGE YES ON  /mnt/xdb1, /mnt/xdb2, /mnt/xdb3, /mnt/xdb4, /mnt/xdb5, /mnt/xdb6, /mnt/xdb7,  /mnt/xdb8 PAGESIZE 16 K <br>
    USER TABLESPACE MANAGED BY  AUTOMATIC STORAGE NO FILE SYSTEM CACHING INITIALSIZE 256 G INCREASESIZE 64G;</p>
  
<P>
UPDATE DB CFG FOR tpox using  NEWLOGPATH /logfile ;</p>
  
<P>
CREATE BUFFERPOOL &quot;TEMPBP&quot;&nbsp; SIZE 12500 PAGESIZE 16384; <br>
    ALTER TABLESPACE TEMPSPACE1  BUFFERPOOL TEMPBP;</p>
  
<P>
-- create tables and index,  and load tables, then run autoconfigure:</p>
  
<P>
AUTOCONFIGURE USING <br>
    MEM_PERCENT 80 WORKLOAD_TYPE  simple NUM_STMTS 1 TPM 200000 <br>
    ADMIN_PRIORITY performance IS_POPULATED  yes NUM_LOCAL_APPS 0 <br>
    NUM_REMOTE_APPS 200  ISOLATION cs BP_RESIZEABLE yes APPLY DB AND DBM;</p>
</div>
<P>

<strong>Figure 6. </strong>Configuring the database in five commands.</p>
<P>
<h3>Scalability on Multi-Core CPUs</h3>
<P>

Figure  7 compares the throughput measured in three different cases.  From left to right they are</p>
<ol start="1" type="1">
  <li>150 concurrent users, four Intel Xeon 7300 CPUs (16       cores total, 2.93 GHz)</li>
  <li>150 concurrent users, four Intel Xeon 7400 CPUs (24       cores total, 2.66 GHz)</li>
  <li>200 concurrent users, four Intel Xeon 7400 CPUs (24       cores total, 2.66 GHz)</li>
</ol>
<P>

In the  first test, the Intel Xeon 7300 quad-core CPUs are saturated with 150  concurrent users. The workload achieves a maximum throughput of 4558  transactions per second at 99.3 CPU utilization. In test 2, moving from the  quad-core (Xeon 7300) to the six-core (Xeon 7400) CPUs increases the  transaction rate for 150 users by 42 percent at only 84.7 percent CPU  utilization. Since the machine is not saturated, test 3 increases the number of  users to 200. This leads to 95.2 percent CPU usage and 6763 transactions per  second, a 48 percent performance gain of the six-core over the quad-core  processors.</p>
<P>

The  performance gain of 1.42x and 1.48x in Figure 7 is remarkable because considering the number of cores  and the clock speed only, the Intel Xeon 7400 CPUs are expected to provide  1.36x higher performance than Intel Xeon 7300 CPUs. The additional speed-up is mainly  due to the 16MB L3 cache which is new in the Intel Xeon 7400 Series processors.  Equally important is the fact that the Intel Xeon 7400 CPUs provide higher  performance while maintaining the same power consumption as the Intel Xeon 7300  CPUs. Increased performance per watt is important to make computing more  economical and cost-effective.</p>
<P>

<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/clip_image011.gif" alt="Figure 7 . Scalability from Intel quad-core to six-core CPUs. " width="480" height="317" border="0"> <br>
    <strong>Figure 7. </strong>Scalability from Intel quad-core to six-core CPUs. </p>
<h3>XML Insert Performance</h3>
<P>

Inserting rows into an empty  table with empty indexes  can be faster than inserting into a table that already contains a large volume  of data. To get a meaningful assessment of XML insert performance, we measured  an insert-only workload on top of the populated 1TB database. The insert test  added 2 million XML documents to the <code>custacc</code> table, and 3 million  documents to the order table  (see Figure 8). Both tables have two XML indexes. XML Schema validation was not  performed. The <code>custacc</code> documents were inserted at a rate of about 4,900 documents per second, which  amounts to ~100GB/hour. The smaller order documents were inserted at 11,900  documents per second, or 69 GB/hour. For both types of documents the insert  tests used 600 concurrent users that issued insert statements without think  time. A commit was performed after every single insert. Less frequent commits  or using DB2's load utility can provide even higher XML ingestion rates.</p>
<P>

<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/clip_image013.gif" alt="Figure 8. Incremental XML insert performance." width="380" height="283" border="0"> <br>
  <strong>Figure 8. </strong>Incremental XML insert performance. </p>
<h3>Lessons  Learned</h3>
<P>

What  did we learn from the 1TB XML performance study? Apart from the actual  performance and scalability results, several observations are valuable.</p>
<P>

One of  the lessons learned is that tuning DB2 for XML-based transaction processing is  not very hard. The strategy to use DB2&rsquo;s autonomic and self-tuning features as  much as possible proved to be very successful. Within a reasonable amount of  time we were not able to achieve higher performance with manual than with  automatic tuning.</p>
<P>

A  prerequisite for good performance is well-balanced hardware, that is, using  &quot;the right&quot; ratio between number of CPU cores, main memory, and number  of disks. With 24 cores, 64GB of memory, and 120 data disks our test system has  5 disks per core and 2.66GB memory per core. The optimal ratio is workload  dependent. In the TPoX mixed workload we observed an average of 1.7 physical  I/O requests per transaction. Hence, at a peak transaction rate of 6763 TPS,  the storage system had to sustain about 11,500 I/O operations per second  (IOPS). Following the rule of thumb that a modern SCSI disk can support about  100 IOPS with reasonably low latencies, about 115 disks are needed to avoid I/O  bottlenecks and allow high CPU utilization.</p>
<P>

DB2  compression was critical. Without compression more disks and more memory would  have been required to achieve the same performance. Compression reduced the  required I/O, which is a benefit that far outweighed the extra CPU cycles to  compress and decompress data.</p>
<P>

To  understand the database performance behavior it proved very useful to use the  DB2 snapshot monitor and takes snapshots at regular intervals, such as every 5  minutes. For example, the collected data allows you to analyze I/O and page  cleaning behavior over time.</p>
<P>

For  Linux and UNIX systems, DB2 9.5 has a fundamentally different process model  than DB2 9.1. While DB2 9.1 spawns a separate <em>process</em> for each agent,  DB2 9.5 runs as a single process with one thread per agent. Our results confirm  that DB2's threaded engine exploits multi-core CPUs very well and achieves good  speed-up from 4-core to 6-core Intel Xeon Processors.</p>
<P>

<em><strong>Agustin Gonzalez</strong> works at Intel Corp. as a senior staff software engineer  in the Software and Solutions Group, where he works in performance enablement  for Intel Xeon platforms. Previously he has worked in several startup and  public companies, amassing more than 15 years of experience in large scale data  management systems, performance optimization, and commercial software  development. </em></p>
<P>

<em><strong>Matthias  Nicola</strong> is a senior software engineer for DB2 pureXML at IBM's Silicon Valley Lab. His work focuses on all  aspects of XML in DB2, including XQuery, SQL/XML, storage, indexing and  performance. Matthias also works closely with customers and business partners,  assisting them in the design, implementation, and optimization of XML  solutions. Prior to joining IBM,  Matthias worked on data warehousing performance for Informix Software. </em></p>

				]]></body>
		</item>
	
		<item>
			<title><![CDATA[XML Storage Options in DB2]]></title>
			<link><![CDATA[http://ibmdatabasemag.com/story/showArticle.jhtml?articleID=216403288&cid=RSSfeed]]></link>
			<description><![CDATA[How to decide which XML storage option best suits a particular application.]]></description>
			<pubDate>Tue, 7 Apr 2009 19:03:00 EDT</pubDate>
			<keywords><![CDATA[XML Storage in Relational Databases, pureXML, XML Shredding, XML Stuffing, native XML, DB2 9]]></keywords>
			<blurb><![CDATA[How to decide which XML storage option best suits a particular application.]]></blurb>
			<authors><![CDATA[Conor O'Mahony]]></authors>
			<body><![CDATA[
			
					
<P>
DB2 provides several options for storing and managing XML  data. For many applications, the best option is to use pureXML, which stores  XML data in its native XML format. However, in certain situations, other  storage options can be beneficial. This article explains the XML storage  options in DB2 so that you can choose the best storage method for your  application. Remember, it&rsquo;s typically not feasible to change the storage method  after a project is implemented, so it&rsquo;s well worth your time to carefully choose  the storage method before a project begins.</p>
<h3>XML  is Pervasive</h3>
<P>

XML debuted  a little more than a decade ago, and has quickly become a leading data format. Its  platform independence and the fact that XML is based on international standards  make it ideal for data exchange.</p>
<P>

There  are numerous industry-specific standards that are based on XML, like ACORD for  insurance, ARTS for retail, FpML for derivatives trading, and XBRL for  financial reporting, to name just a few. These standards allow organizations to  easily share and exchange information. They also allow software companies to  develop standards-based tools and solutions, which make it easier and more  affordable for organizations to assemble their IT infrastructure.</p>
<h3>DB2  pureXML: Storing XML Data Using the XML Data Type</h3>
<P>

<P>
  There  was much fanfare when DB2 introduced the pureXML feature to the market in 2006.  Forbes.com wrote that &ldquo;IBM is &hellip;taking  a more holistic approach than its competitors to combine XML and relational  systems.&rdquo; The Data Administration Newsletter (TDAN) wrote that &ldquo;DB2/Viper will  be able to process XML more efficiently than the other major DBMS players (e.g.  Oracle and Microsoft).&rdquo;</p>
<P>

<P>
  Since  then, the fanfare has continued as customers have seen for themselves how  pureXML makes it easier for developers and data administrators to store and work  with XML data. For instance, UCLA Health Services reported a nearly 83 percent decrease in number  of database tables that administrators must manage and a 70 percent reduction  in the number of database staff needed to add new schemas and data to the  system. Storebrand Group reported that programming search processes require only  30 minutes with pureXML; previously this task took between 2 and 8 hours. Updating  XML schemas is also much faster &mdash; it now takes five minutes, compared to one  week with a shredding-based approach.</p>
<P>

<P>
  When  storing XML data using pureXML, DB2 treats the XML data as a tree of XML  elements and attributes, instead of treating the data as a string or mapping  the data into a relational format. The name &ldquo;pureXML&rdquo; was chosen because DB2  actually stores and works with the XML in its purest format rather than attempting  to retrofit it into the existing relational database infrastructure. IBM built  a new native XML capability into DB2 that works hand-in-hand with the existing relational  infrastructure. Using such an approach avoids many performance and  administrative challenges associated with XML storage.</p>
<P>

<P>
  Figure  1 shows the DB2 storage architecture. Notice how the XML-optimized storage is separate  from the relational data storage. Common services occur across both types of  data, providing language flexibility. In other words, you can use SQL to access  the data, or you can use XQuery to retrieve data.</p>
<P>

<img width="359" height="386" src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/clip_image002_0001.jpg"><br>
    <strong>Figure 1.</strong> DB2&rsquo;s hybrid XML/relational  data storage architecture.</p>
<P>

There  is one important exception to the model presented in this diagram: You can  optionally choose to store the XML data &ldquo;inline&rdquo; in a relational row. When you  do this, the XML data is still stored using the pureXML parsed tree format; the  only difference is that it&rsquo;s placed in a different physical location. This  location is transparent to the application and user. Because the XML data is  stored in the relational row, you can take advantage of DB2&rsquo;s deep compression  to further compress the stored XML data and improve runtime performance of  queries that are I/O bound. &nbsp;&nbsp;</p>
<P>

Let&rsquo;s  look at the other approaches to storing XML data in DB2 before going into more  detail on when one is more advantageous than the others.</p>
<h3>Large  Object Storage
</h3>
<P>

<P>
  A  simple way to store XML data in a relational database is to place the XML data &ldquo;as-is&rdquo;  into a single field whose data type is set to Character Large Object (CLOB) or  Binary Large Object (BLOB). This process is sometimes referred to as &ldquo;stuffing.&rdquo;  Figure 2 shows the stuffing of XML data into a relational database.</p>
<P>

<img width="329" height="374" src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/clip_image004_0000.jpg"><br>
    <strong>Figure 2.</strong> Storing XML as a large object  (also known as &ldquo;stuffing&rdquo;). </p>
<P>

With  stuffing, the XML data is placed into a single cell. The data is stored as either  character or binary data, not as XML data. This approach makes it easy to store  and retrieve the data. Stuffing is a good approach when all you want to do is  retrieve the entire XML data &ldquo;as is.&rdquo; </p>
<P>

However,  if you want to issue queries against individual pieces of information in the  XML data, you may encounter slower query response than if you used pureXML  storage. Because the XML data is being stored in the database as a character or  binary object, the database doesn&rsquo;t know that it is, in fact, XML data. And you  can&rsquo;t use the database to issue a query directly against specific tags in the  XML data. You need to retrieve the character object or binary object in its entirety  and then work with it. </p>
<P>

For  example, imagine a typical XML data record in your database looks like the  following:</p>
<P>

&lt;employee&gt;<br>
  &nbsp;&nbsp;&nbsp;  &lt;name&gt;John Doe&lt;/name&gt;<br>
  &nbsp;&nbsp;&nbsp; &lt;address&gt;<br>
  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;street&gt;123 Main Street&lt;/street&gt;<br>
  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;city&gt;New York&lt;/city&gt;<br>
  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;state&gt;New York&lt;/state&gt;<br>
  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;zip&gt;10003&lt;/zip&gt;<br>
  &nbsp;&nbsp;&nbsp; &lt;/address&gt;<br>
  &nbsp;&nbsp;&nbsp; &hellip;<br>
  &lt;/employee&gt;</p>
<P>

Now,  if you want to run a query for all employees in New York City, your application  will need to read each character object or binary object from the database,  parse the XML data in that character object or binary object to &ldquo;understand&rdquo;  the structure of the XML data, and then determine which objects have a &ldquo;&lt;city&gt;&rdquo;  element with a value of New York City. It can take a lot of computation to read  each object, parse the XML data, and then determine the objects that meet the  search criteria.</p>
<P>

Stuffing  is a good approach as long as you don&rsquo;t need to issue queries against  information in the XML data or work with individual pieces of information in  the XML data. If all you need to do is store and retrieve the entire XML data &ldquo;as  is,&rdquo; without working with the actual XML itself, then the simplicity of  stuffing is appealing.</p>
<h3>Shredding  for Fast Query Performance</h3>
<P>

<P>
  A  popular approach to storing XML data is to &ldquo;shred&rdquo; it into a relational  database. Shredding is a process in which the individual elements, attributes, and  pieces of information in the XML data are mapped into separate fields in relational  tables. Figure 3 shows the shredding of XML data into a relational database.</p>
<P>

<img width="309" height="356" src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/clip_image006_0000.jpg"><br>
    <strong>Figure 3.</strong> Shredding XML.</p>
<P>

Because  each piece of information in the XML data is put into a separate relational  field, you can leverage the relational query optimization in DB2 to run fast queries  against individual pieces of information from the XML data.</p>
<P>

Of  course, the process of parsing the XML data, taking the individual pieces of  information from it, and then mapping those pieces of information into the database  fields means that inserting the XML data into the database may take longer than  other approaches. If you&rsquo;re shredding a relatively small amount of XML data  into a handful of tables and columns, the additional time required for  shredding the data at insert time may not be significant. The data insert  overhead of shredding typically is an issue only if you&rsquo;re working with complex  schemas that map to hundreds of tables and columns. Even then, the performance impact  may either be relatively minor or it may be acceptable in light of the query  performance gains.</p>
<P>

If  you need to reconstitute the original XML data for whatever reason, then you  either need to reconstruct it from the individual cells or you need to also store  the entire XML data as a large object in the database. If you reconstruct the  XML data from the individual cells, your runtime performance may suffer. You should  also be aware that you can&rsquo;t guarantee that you&rsquo;ll be able to re-constitute the  XML data into its exact original form, which may be a problem for environments in  which maintaining the integrity of data is important for audit purposes. If you  store the entire original XML data as a large object in the database, you will  increase your storage requirements. In many cases, these issues are also  relatively minor in comparison to the query performance benefits of shredding.</p>

				
					
<P>
A potentially  more significant drawback of shredding is the impact on resources for software  development and database administration. Shredding typically results in more  complex database administration and software development projects. The  additional effort to coordinate the shredding of XML data is incurred when you  first set up the system as well as each time you update the system. You must  create a relational database schema for the individual pieces of information  that you are extracting from the XML data. With some complex real world  applications, the overhead required to create and maintain such data schemas  can be considerable. &nbsp;</p>
<P>

For instance,  consider the shredding of the schema for the Financial Products Markup Language  (FpML) industry standard, which is commonly used in derivatives trading  environments. Because the FpML standard represents complex data with literally  hundreds of different possible pieces of information, the corresponding normalized  relational database can have hundreds of corresponding columns. Now consider  the following:</p>
<ul>
  <li>Developers must navigate       this complex mapping whenever they work with the data</li>
  <li>Database administrators       must manage these new columns.</li>
</ul>
<P>

Also,  keep in mind that many industry standard formats evolve over time, sometimes  changing their XML schema at regular intervals. If you&rsquo;re working with an XML  schema that changes frequently, you must update your mapping environment whenever  the XML schema changes. This can impact on productivity for both software  development and database administrators on an ongoing basis.&nbsp; </p>
<P>

So,  in summary, shredding typically optimizes query performance at the cost of increased  resource needs for software development and database administration.</p>
<h3>Comparing XML Storage Options</h3>
<P>

In many cases, using pureXML is the best approach to your  XML storage needs.&nbsp; By using pureXML, you  can specify that the columns holding the information store XML data.&nbsp; You can then use query languages that were  designed for working with XML data like SQL/XML and XQuery, which make life much  easier for both database administrators and developers.&nbsp; Because the database &ldquo;knows&rdquo; that is it XML  data and understands how to work with that XML data, you can issue efficient database  queries directly against the individual pieces of information in the XML data. You  can also take advantage of database features like compression that support XML  data. The XML features in the database typically provide the most effective and  efficient way to work with XML data.</p>
<P>

There are exceptions when an alternative approach like  shredding may be better. For instance, you may have an environment where query  performance is by far the most important consideration, even if achieving it  incurs increased software development and database administration costs in  terms of the complex mapping between the XML schema and the relational schema. In  such cases, you are willing to make these sacrifices in order to take advantage  of query performance.</p>
<P>

Note, however, that the query performance of shredding is  not always better than the performance of pureXML. Sometimes, shredding can be  inefficient. For example, if you need to retrieve the XML data in its original  form, then the database will need to remap all of the information from the  relational tables back into its original XML format, resulting in a relatively  slow response.</p>
<P>

DB2 pureXML currently has reasonably good performance. The  recent Transaction Processing over XML (TPoX) benchmark carried out by Intel  labs demonstrated a data insert performance of 100 GB/hour on a 4-CPU server. It  also showed a stable throughput of 6,763 XML-based transactions per second for  a mixed workload on that same server. So, although working with pureXML is not  as fast as working with individual pieces of information in &ldquo;traditional  relational&rdquo; columns, it is making good progress. </p>
<P>

There&rsquo;s another scenario where you may consider an  alternative approach. Imagine that you want to store an XML document and you do  not plan to issue queries against its contents. Perhaps you plan to simply  retrieve the XML document based upon an identifier in the database and then display  that XML data. Maybe, for example, you are storing data for audit compliance  reasons. In such cases, you do not need to work with the actual contents of the  XML data, and the most efficient and best performing approach to storing the  XML data may be to use the CLOB or BLOB data type. You will likely enjoy a  slight improvement in query performance, and improved data insert performance. &nbsp;</p>
<P>

Table 1 summarizes the advantages and disadvantages of the  three approaches. It also provides guidelines on when you should consider each  approach.</p>
<table border="1" cellspacing="0" cellpadding="0">
  <tr>
    <td width="80" valign="top">
<P>
<strong>&nbsp;</strong></p></td>
    <td width="167" valign="top">
<P>
<strong>Advantages</strong></p></td>
    <td width="168" valign="top">
<P>
<strong>Disadvantages</strong></p></td>
    <td width="175" valign="top">
<P>
<strong>When to Use</strong></p></td>
  </tr>
  <tr>
    <td width="80" valign="top">
<P>
<strong>pureXML</strong></p></td>
    <td width="167" valign="top"><ol>
      <li>Maintains the    fidelity of XML data</li>
      <li>Efficient and    effective native storage</li>
      <li>Efficient and    effective XML query languages</li>
      <li>Good    performance in many circumstances</li>
      <li>Support for    database features (like compression)</li>
      <li>Easier software    development</li>
      <li>Easier database    administration</li>
      <li>Easier    management of XML schemas</li>
      <li>Free download    code for many XML standards</li>
      <li>Better support    for evolving query needs</li>
    </ol>
        
<P>
&nbsp;</p></td>
    <td width="168" valign="top">
<P>
&nbsp;</p></td>
    <td width="175" valign="top">
<P>
Except for the cases    described below, pureXML should be the default choice for your XML storage    needs.</p></td>
  </tr>
  <tr>
    <td width="80" valign="top">
<P>
<strong>Stuffing</strong></p></td>
    <td width="167" valign="top"><ol>
      <li>Simple concept</li>
      <li>Maintains the    fidelity of XML data</li>
    </ol>
        
<P>
&nbsp;</p></td>
    <td width="168" valign="top"><ol>
      <li>Poor    performance for sub-document queries </li>
      <li>The database    does not &ldquo;understand&rdquo; the XML</li>
      <li>No management    of XML schemas</li>
    </ol>
        
<P>
&nbsp;</p></td>
    <td width="175" valign="top">
<P>
Use when you are confident    you will not want to issue queries against the contents of the XML data.</p></td>
  </tr>
  <tr>
    <td width="80" valign="top">
<P>
<strong>Shredding</strong></p></td>
    <td width="167" valign="top"><ol>
      <li>Often the best query performance</li>
    </ol>
        
<P>
&nbsp;</p></td>
    <td width="168" valign="top"><ol>
      <li>Does not maintain the fidelity of XML data</li>
      <li>More complex environment</li>
      <li>Requires more software to be written and maintained</li>
      <li>More complex for database administrators</li>
      <li>Performance cost when writing data to database</li>
      <li>Performance cost if reconstructing original XML data</li>
      <li>No management of XML schemas</li>
    </ol>
        
<P>
&nbsp;</p></td>
    <td width="175" valign="top">
<P>
Use when:</p>
        <ol>
          <ol>
            <li>You are working with applications or tools that only    have relational APIs.</li>
            <li>Query performance is by far the most important    consideration, and the shredding query performance is faster than pureXML.</li>
          </ol>
        </ol></td>
  </tr>
</table>
<P>

Before I finish, I must mention an excellent resource that  you should make sure to read. In the developerWorks article titled &ldquo;A  performance comparison of DB2 9 pureXML and CLOB or shredded XML storage&rdquo;,  Matthias Nicola and Vitor Rodrigues present a detailed analysis of the  performance of various aspects of working with XML data using the different  approaches available in DB2.</p>
<P>

<P>
<em><strong>Conor O'Mahony</strong> is program director for DB2 Product Marketing at IBM and writes the Native XML Database blog.</em></p>
				]]></body>
		</item>
	
		<item>
			<title><![CDATA[XML and the Database:  A Perspective for the DBA]]></title>
			<link><![CDATA[http://ibmdatabasemag.com/story/showArticle.jhtml?articleID=216402743&cid=RSSfeed]]></link>
			<description><![CDATA[With DB2 pureXML, DBAs can apply existing skills to manage XML growth and pursue opportunities to learn new XML skills if they choose.]]></description>
			<pubDate>Fri, 3 Apr 2009 16:11:00 EDT</pubDate>
			<keywords><![CDATA[DBA, XML in Relational Databases, pureXML, native XML, DB2 9]]></keywords>
			<blurb><![CDATA[With DB2 pureXML, DBAs can apply existing skills to manage XML growth and pursue opportunities to learn new XML skills if they choose.]]></blurb>
			<authors><![CDATA[Bryan Patterson and Dexiong Terry Zhang]]></authors>
			<body><![CDATA[
			
					
<P>
Once a database system is launched and stabilized for a set of applications, DBAs are often hesitant to make significant changes that could impact the availability or performance of existing applications that use the database.</p> 
<P>

<P>
XML can solve this problem. Companies and agencies across many industries are moving to XML as a data format for capturing and exchanging information about their business. The decision about how the XML data is stored in the database along with other important business data can significantly affect the DBA; however, DBAs often aren't actively involved in the decision.</p>  
<P>

<P>
This article briefly reviews the rise of XML and discusses the different ways relational database systems can be used to store and manage this data. We particularly focus on the new pureXML capabilities in DB2 9, which manage XML data in its native hierarchical form and simplify the XML data management challenge. We outline the new opportunities that pureXML gives database professionals to apply their existing skills to solve the XML management challenge and describe opportunities pureXML offers DBAs to enhance their skills and value to their organization by enabling participation in application discussions.</p>
<P>
<h3>Background: The Rise of XML</h3>
<P>

<P>
XML provides a flexible yet standardized way of defining data for exchange among different systems, platforms, applications, and organizations. XML is a royalty-free standard recommended by the World Wide Web Consortium (W3C). XML documents are known as self-describing since they use tags (markup) to describe the data values. The data and associated tags can be grouped and nested to express hierarchical relationships within the data. New tags can be created and used as business needs evolve, allowing a flexible way to expand the information that is represented. XML instance documents can optionally be validated against a schema document which provides a template for the structure and possible values of XML documents. These characteristics (standardized, self-describing, flexible, expandable, and platform-independent) have enabled XML to become a very popular choice as a data exchange format. The popularity of XML has resulted in many readily available tools, such as parsers and transformation engines, which make the receipt and processing of XML data within a business a straightforward activity.</p>
<P>

<P>
XML has been used to implement data format standards for business-to-business (B2B) and business-to-customer (B2C) information exchange and for modeling business activity within a company. There are many examples of XML standard formats that have been defined for B2B data exchange for specific industries (ACORD for insurance, FiXML for financial services, NIEM for government, HL7 for healthcare, and ARTS for retail) and cross-industry exchange (ebXML and UBL). One business-to-customer XML standard that is growing in popularity is XForms, a W3C Recommendation for processing and presenting data such as web forms. Internal application integration technologies such as service-oriented architectures (SOA) and web services (for example, SOAP and RSS) have also used XML as an underlying data format.</p>
<P>

<P>
With so many data models, standards and technologies based on XML, companies and government agencies are increasingly dependent on XML for business processing. The information captured in the XML data represents a growing business value to the enterprises. As more business transactions are conducted through interfaces that rely on XML data, the need for some of those XML documents to be retained for auditing and regulatory compliance grows. Government agencies and commercial enterprises may be required to preserve the original request, claim, order, trade, or submission represented in the XML documents for legal or business reasons. In other cases, the XML-based document contains sufficient business information that the enterprise needs to preserve it for downstream processes or for business analysis and insight.  The challenge is to determine the best way to retain the XML data for their business purposes.</p>
<P>
<h3>Storing XML in Relational Databases</h3>
<P>

<P>
XML data from electronic forms, business transactions, and information exchanges with customers and partners often represents important business information that requires secure and reliable retention. File systems can provide storage but may fail to meet requirements for efficient and controlled access and integrated backup and recovery. Most businesses rely on relational databases to provide secure storage along with access control, concurrency management, efficient access, and backup and recovery services. The business value that XML data represents warrants the storage management capabilities of a relational database.</p>
<P>

<P>
Relational database systems can manage XML data in a variety of ways. The methods for storing XML in a relational database can be grouped into these categories:</p>
<P>
<ul>
<li><strong>Large object:</strong> a serialized string representing the XML document (CLOB, BLOB, and so on)</li><br />
<li><strong>Decomposed:</strong> individual data elements extracted from XML and mapped to relational columns (often referred to as "shredding")</li><br />
<li><strong>Native XML:</strong> a full XML representation within the database.</li>
</ul>
<P>

<P>
Each of these methods has use cases where they provide the best solution for storing XML. However, the first two involve significant compromises to the application complexity or performance for most use cases (see the "performance comparison" paper in the <a href="/story/showArticle.jhtml?articleID=216402743&pgno=2#resources">Resources</a> section for a review and quantification of the cases where each solution shows benefits).  Native XML, as implemented by DB2 pureXML, provides superior results to all but the simplest use cases since it makes the contents of the XML document directly available to the application, either in whole or in part, without requiring reparsing or reassembling. </p>
<P>

<P>
DB2 pureXML allows XML data to be stored and queried in its inherent hierarchical form, as shown in figure 1. DB2 has sophisticated XML indexing capabilities and support for optional XML Schema validation. DB2 also provides two industry standard query languages, SQL/XML and XQuery, for accessing and manipulating XML data. For a more detailed review of the capabilities of DB2 pureXML see the <a href="/story/showArticle.jhtml?articleID=216402743&pgno=2#resources">Resources</a> section.</p>  
 
<P>

<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/pureXML.gif" alt="DBA activities for relational and pureXML data" border="0"><br /><strong>Figure 1: DB2 pureXML.</strong></p>
<P>

<P>
The primary benefits of using pureXML to store XML data are:</p>
<P>
<ol>
<li>Reduced application size and complexity. pureXML eliminates the need to decompose and recompose XML data (shredding) or to transfer and parse full documents for each query (for example, from CLOBs).</li>
<br />
<li>Improved storage efficiency and flexibility. Optimized XML storage persists XML "as is," including allowing varying or evolving data structure without requiring database schema changes.</li>
<br />
<li>Better application performance.</li>
</ol>
<P>

<P>
Note that each of these benefits affects the application, but the second one also affects the database. The database flexibility is a significant benefit to business operations and a reduction in workload for the DBA because changes to the data exchange contents, the XML, will no longer require the database schema to be altered. This results in higher system availability and eliminates delays waiting for change windows.</p>
<P>
<h3>Storing XML</h3> 
<P>

<P>
Applications have long been able to use large objects and shredding to store XML-sourced data within relational database tables. DB2 continues to support these options; however, starting with DB2 9, it also provides pureXML to enable XML data to be stored natively. Database administrators can still support applications that shred XML data into relational tables or store it as large objects, but for new (or evolving) applications, they can help architects and developers eliminate this processing by storing the XML data in a single column. When pureXML is used, database administrators or architects will no longer need to worry about tedious XML-to-relational data mapping, which are hard to adjust as XML data models evolve. Eliminating the data mapping process enables an end-to-end XML architecture for applications and greatly simplifies the associated application code development and maintenance.</p>
<P>

<P>
Directly storing XML documents in their hierarchical structure also means fewer relational objects (tables/columns) in the database. A single pureXML column can handle full XML documents whereas large XML documents, such as those associated with industry standards, often shred into hundreds of tables with thousands of columns. And when the XML structure evolves (such as adding new elements to capture a new business data need), the database doesn't need to be modified, which saves time, effort and system availability.</p> 
<P>

<P>
When a business stores XML in a pureXML column instead of shredding, the DBA's job is simplified even though the data being stored has more semantic (and business) value than the shredded alternative. The New York State Department of Taxation and Finance realized this data management simplification as it moved to XML-based tax processing with DB2 pureXML to manage the tax filings (see <a href="/story/showArticle.jhtml?articleID=216402743&pgno=2#resources">Resources</a>).</p>
<P>

<P>
Even when decomposition is determined to be the appropriate storage model, DB2 9 provides an enhanced decomposition facility based on an XML schema that can greatly simplify and speed up the shredding process. By being aware of this capability, DBAs can again help their architects and application development teams implement efficient XML oriented applications.</p>
<P>
<h3>Using Existing DBA Skills</h3> 
<P>

<P>
Using pureXML to store XML in a database takes advantage of the strengths of not only the database but also of the database administrator.  Storing XML in a pureXML column in the database gains the security, access control, optimized access, and backup and recovery capabilities that relational databases like DB2 provide.  Although XML data has different original characteristics than traditional relational data, when it is stored in an XML column in DB2, the DBAs execute the same actions and use the same tools that they already use with existing DB2 data. Some of the DBA tasks that apply equally to pureXML and traditional relational data are listed in Table 1.</p>
<P>

<P>
<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/dba_activity.gif" alt="DBA activities for relational and pureXML data" border="0"><br /><strong>Table 1: DBA activities for relational and pureXML data.</strong></p>
				
					
<P>
XML support is implemented deeply within DB2 such that the database engine takes care of the majority of the complexities of storing XML and makes it appear to users, tools, and utilities as just another data type.  (The details of how DB2 stores and manages XML are beyond the scope of this paper but suffice it to say that it is a sophisticated implementation that automatically and efficiently handles things like parsing, compression, and tree navigation.  See the <a href="#resources">Resources</a> section for links to more detail).  Just like the SQL language standard has been extended to deal with XML, DB2 services and utilities have been extended to recognize and operate on XML data. This provides an opportunity for database administrators to manage XML and to enhance their value while utilizing their well-established experience with relational data storage.  For instance, XML data is still added to or removed from a table using the established LOAD and UNLOAD utilities.   Indexes are still created with the CREATE INDEX statement (expanded to designate specific XML elements or attributes) and optimal performance of queries on XML data relies on appropriate use of the RUNSTATS, EXPLAIN and REORG utilities just like relational.</p>  
<P>

<P>
Let's take a closer look at RUNSTATS as an example since most DB2 database administrators are familiar with it and it shows how deeply integrated the pureXML support is in DB2. The RUNSTATS command has been extended to collect statistics on XML data and XML indexes. The DB2 cost-based optimizer uses these statistics to generate efficient execution plans for SQL/XML and XQuery queries, including determining when to use relational or XML indexes, etc. Because of the deep integration of XML in DB2, database administrators can continue to use the RUNSTATS command as they have for relational data. Other DB2 utilities have been similarly extended to support XML while minimizing the learning curve for database administrators.</p>
<P>

<P>
To be sure, there are some new considerations when dealing with XML, many of which we cover in the next section.  One of the main considerations stems from the fact that XML data is often much bigger than the discrete data items that traditional relational columns represent (e.g., name, date), possibly containing hundreds or thousands of data items.  To efficiently handle this, DB2 physically stores the XML data separate from the relational data in a location called the XML data area (XDA on DB2 LUW, an internal auxiliary table on DB2 for z/OS).  This enables different configuration settings, for instance pages sizes and buffer pools, to be used as needed based on the data characteristics to make the data access optimal for both XML and relational data.</p>  
<P>
<h3>Processing XML: Enhancing DBA Value</h3>
<P>

<P>
XML technology is not new, but its effective use within a relational database is new with DB2 9. This merging of XML and relational database technologies presents DB2 DBAs with an opportunity to expand their capabilities by learning XML in the context of the database and to engage in business discussions regarding internet and business-to-business application technologies.</p>
<P>

<P>
DBAs who manage XML data will have the opportunity to learn about XML itself and related technologies such as XPath, XQuery and XML Schemas. XPath is the W3C standard that defines how to navigate through XML documents and select specific nodes (elements or attributes).</p> 
<P>

<P>
XPath is a good place to start understanding XML processing. The SQL standard has been expanded to include XML query processing incorporating XPath, often called SQL/XML. DB2 pureXML also supports XQuery, a W3C standard that uses XPath. XQuery provides an alternative means to query XML. Finally, DB2 provides a repository and support for XML Schemas, which can define a structure and restriction of contents for particular XML documents, presenting a new opportunity for DBAs to explore this technology and add value to their business processes.</p> 
<P>

<P>
Using DB2 pureXML will give DBAs the chance to explore these new capabilities and add them to their portfolio of skills. We will explore how these technologies can be applied to business decisions.</p>
<P>

<P>
DBAs are in the unique position of knowing the capabilities of the database and how it can effectively support XML data for new applications. In addition to traditional DBA roles, support of DB2 pureXML provides the DBA an opportunity to participate in and contribute to architecture and application level discussions covering topics.</p> 
<P>

<P>
<strong><em>The value of end-to-end XML applications.</em></strong> Because many applications send and receive data in XML format, it makes sense to consider storing the data in that same format, avoiding deconstruction and reconstruction for each transaction.  By storing XML in DB2 pureXML, the applications avoid going through costly and time-consuming conversions.  Yet this conversion is precisely what many applications do upon receiving XML, possibly because the developers don't realize they have a viable option for storing and accessing XML efficiently.  DBAs can help application architects understand this option and when to incorporate it into the application design. </p>
<P>

<P>
<strong><em>When to store XML as XML and when to "shred".</em></strong> Storing XML in pureXML provides benefits to most but not all applications. Application architects and designers need to understand when to choose XML over relational storage. Database architects can provide that insight based on their understanding of the tradeoffs and benefits to applications. If the data being modeled is naturally tabular and static, it may be better to represent it in relational format. However, when data is varied, deeply hierarchical, or subject to frequent changes, then the relational model is not usually a good choice to store the XML. Armed with this knowledge of the options available, database administrators can help choose the best approach for data storage based on application needs. Examples of when XML data is a better choice than relational data include:</p>
<P>
<ul>
<li><strong>When the data schema changes often.</strong> The self-describing and extensible nature of XML allows seamless handling of schema variability and evolution. Changes in the XML document format are accommodated without requiring changes to tables or columns in the database and typically without impacting existing XML queries.</li>
<br />
<li><strong>When data has a hierarchical structure.</strong> Because XML is a hierarchical data model, it's a much more natural representation for inherently hierarchical business data including nested and repeating elements. Using XML allows simple, navigational data access to replace complex set operations if the same data was represented in tabular format.</li>
<br />
<li><strong>When the data contains varying content.</strong> An XML schema can define a very large number of optional elements, which means that each document may contain only a few of them and contents can vary significantly from one XML document to the next. While every row in a relational table has to have the exact same columns, which would result in many NULL values, XML documents can have different elements and be stored in the same XML column.</li>
<br />
<li><strong>When the exchange format is XML.</strong> XML has been used for various industry and business standards. Storing this data in its exchange format (XML) means that the data is ready to be exchanged without any preprocessing.</li>
</ul>
<P>

<P>
There is another decision to which DBAs can contribute. Because DB2 pureXML is a hybrid database, meaning it can store both relational and XML data together, businesses don't have to make an "either or" decision. In some cases, it may be beneficial to store the XML document in an XML column and extract one or more elements as a relational column in the same table. This approach can be useful when certain elements are expected to be frequently used as an access key or possibly to link to another table as a foreign key.</p>
<P>
<div class="Article_Sidebar_Float-Right" id="Article_Sidebar">
<h3>Common XML Terms</h3>
<P>

<P>
<a href="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/commonXMLterms.gif" target="_blank">
</a></p>
<P>

<a href="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/commonXMLterms.gif" target="_blank"><img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/commonXMLterms_sm.gif" alt="chart" border="0"></a><br>
<b>Click on image to enlarge it.</b></p>
</div>
<P>

<P>
<strong><em>Using XML schemas and validation.</em></strong> XML schemas provide a means to create a defined structure and data profile which individual documents may be validated against. DB2 pureXML provides the means to store XML schema documents and the facilities to validate XML documents against stored schemas. For example, XML documents can be validated prior to storing (on INSERT), rejecting those that don't validate. These DB2 capabilities give the DBA another opportunity to provide value to the architects and application teams by presenting an option for safely storing and controlling XML schemas and for providing database level validation against XML schemas.</p>
<P>

<P>
<strong><em>Querying and reporting on XML.</em></strong> DB2 provides two languages for data manipulation: SQL/XML and XQuery. Both provide access to the same data and offer essentially the same capabilities, so application developers will want advice on when to use which language. This decision will often come down to which language the application team is more comfortable with. For developers who have a history of working with relational databases, SQL/XML may be the better choice because it builds on the SQL experience they already have. In addition to query capabilities via the XMLQUERY and XMLEXISTS functions, the SQL language standard defines functions for constructing, validating, and converting XML data.</p> 
<P>

<P>
In contrast, XQuery is a query language which is specifically designed for querying XML data. XQuery is semantically similar to SQL which enables database administrators and developers to pick it up rather easily. Database programmers can construct queries in DB2 pureXML using SQL/XML or XQuery or even a combination of the two. See the "How to query XML data efficiently and effectively" section of the "Best Practices" document listed in <a href="#resources">Resources</a> further information on determining which query language to use. DBAs will also want to help developers match their queries to indexes defined on the XML data to maximize performance of critical, frequently used queries.</p>
<P>

<P>
Reporting on XML data builds on the query capabilities. By writing effective queries, applications can produce the data elements needed for analysis and reports. Many business intelligence and reporting tools, however, haven't yet enhanced their functionality to be able to manipulate complex XML. DBAs can help business analysts construct appropriate queries to extract the needed data elements. Alternatively, they can use the SQL XMLTABLE function to create a view which presents stored XML documents as columns that reporting tools can effectively deal with (or even extract to a separate table for reporting if usage characteristics warrant).</p>
<P>

<P>
<strong><em>How to produce XML from relational data.</em></strong> Existing applications rely on data stored in DB2 relational structures for many processes and transmissions. In some cases the application will need to produce XML structures. DB2 provides numerous functions, defined by the SQL standard, that produce XML structures from relational data (these are sometimes called "publishing" functions). This resource lets the database professional present an alternative to application architects who need to construct XML documents from relational data. It may be more efficient to produce the XML during the query process than to extract and transfer the relational data to an application that will then construct the XML document.</p>
<P>

<P>
When to store inline versus XDA. DB2 stores XML data separately from relational data to allow for different configuration settings for the two types of data. DB2 9.5 also allows DBAs to choose to store XML documents of a specified maximum size on the data pages with the relational data (this is called "inline") if they believe that the characteristics are similar to the relational data. When inline storage is used for XML data, DBAs can also use additional compression on the inline XML documents. These capabilities give DBAs additional options for maximizing system storage efficiency and performance.</p>  
<P>

<P>
There are many opportunities for DBAs and other database professionals to increase their database skills when they use DB2 pureXML to manage XML data in the database. They can expand the capabilities they offer their organizations for storing XML. And, by learning about the benefits, and tradeoffs of pureXML, they can participate in discussions with application architects and developers regarding the best choices for how to manage XML data to maximize application productivity and performance. DBAs have the opportunity to expand their influence by providing new XML insights.</p>
<P>
<h3>Meeting Needs with Native XML</h3>
<P>

<P>
IBM pureXML provides native XML support in DB2 to address the emerging need for storing XML. DB2 pureXML provides an alternative to XML-to-relational data mapping processes while utilizing existing utilities and tools to efficiently store and manage the XML data in DB2. With DB2 pureXML, DBAs will be able to immediately apply their significant skills to help businesses manage XML growth while learning new XML skills such as XPath and XQuery. XML presents a great opportunity for database professionals to help their businesses and help themselves at the same time. To learn more about DB2 pureXML, review the materials in the Resources section.</p>
<P>
<hr width="60%">
<P>

<P>
<em>The authors thank the individuals who assisted with or reviewed this article, including Conor O'Mahony and Jason Cu at IBM and Bob LaCerais and his database team at the New York State Department of Taxation.</em></p>
<P>

<P>
<em><strong>Bryan Patterson</strong> is a senior solutions architect at IBM's Silicon Valley Laboratory who specializes in database management.  He has over 20 years of software industry experience including management positions in development, quality assurance and product planning.  His email address is <a href="mailto:bryanp@us.ibm.com">bryanp@us.ibm.com</a>.</em></p>
<P>

<P>
<em><strong>Dexiong Terry Zhang</strong> is a software engineer intern at IBM's Silicon Valley Lab. He graduated from San Jose State University in Computer Science. He joined the DB2 pureXML Enablement team in 2008 and is working on integration of pureXML in various projects.</em></p>
<P>
<a name="resources"></a><h3>Resources</h3>
<P>

<P>
"<a href="http://www.ibm.com/developerworks/data/library/techarticle/dm-0606nicola/" target="_blank">pureXML in DB2 9: Which way to query your XML data?</a>" Nicola, Matthias and Fatma Ozcan, IBM developerWorks, 08 June 2006, 28 August 2007</p>
<P>

<P>
"<a href="http://www.ibm.com/developerworks/db2/library/techarticle/dm-0612nicola/" target="_blank">A performance comparison of DB2 9 pureXML and CLOB or shredded XML storage</a>," IBM developerWorks, 07 Dec 2006</p>
<P>

<P>
"<a href="http://download.boulder.ibm.com/ibmdl/pub/software/dw/dm/db2/bestpractices/DB2BP_XML_0508I.pdf" target="_blank">Best Practices &#8212; Managing XML Data</a>," Nicola, Matthias and Susanne Englert, IBM</p> 
<P>

<P>
"<a href="http://www.redbooks.ibm.com/abstracts/sg247298.html?Open" target="_blank">DB2 9 pureXML: Overview and Fast Start</a>," Saracco, Cynthia M., Donald Chamberlin, and Rav Ahuja, IBM Redbooks, 13 July 2006</p>
<P>

<P>
<a href="http://www.ibm.com/software/data/db2/xml/" target="_blank">DB2 pureXML</a></p>
<P>

<P>
<a href="http://www.ibm.com/developerworks/wikis/display/db2xml/Home" target="_blank">DB2 pureXML Enablement Wiki</a></p> 
<P>

<P>
<a href="ftp://ftp.software.ibm.com/ps/products/db2/info/vr9/pdf/letter/en_US/db2xge90.pdf" target="_blank">DB2 Version 9 for Linux, UNIX, and Windows XML Guide</a></p>
<P>

<P>
<a href="http://www.ibm.com/developerworks/forums/forum.jspa?forumID=1423" target="_blank">pureXML Discussion Forum</a></p>
<P>

<P>
Case Study: <a href="http://www.ibm.com/developerworks/wikis/download/attachments/5080391/NYS_Tax_IMC14008_USEN_00.pdf" target="_blank">New York State Tax Agency Uses pureXML to Simplify Filing of More than 2 Million Returns</a></p> 

				]]></body>
		</item>
	
		<item>
			<title><![CDATA[Building a Smarter Planet]]></title>
			<link><![CDATA[http://ibmdatabasemag.com/story/showArticle.jhtml?articleID=216300258&cid=RSSfeed]]></link>
			<description><![CDATA[In times of plenty, using less to do more is a smart strategy. In this economy, it's a requirement. Well-managed, accessible information is the key to smarter resource use in government, commerce, and at home -- and a win-win situation for business and the environment.]]></description>
			<pubDate>Tue, 31 Mar 2009 17:00:16 EDT</pubDate>
			<keywords><![CDATA[Smarter Planet, Informix Dynamic Server, Information Overflow, Data Management, Green IT, DB2, IDS]]></keywords>
			<blurb><![CDATA[In times of plenty, using less to do more is a smart strategy. In this economy, it's a requirement. Well-managed, accessible information is the key to smarter resource use in government, commerce, and at home -- and a win-win situation for business and the environment.]]></blurb>
			<authors><![CDATA[Christy Maver]]></authors>
			<body><![CDATA[
			
					
<P>
<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/feature_image1sm.jpg" alt="Building A Smarter Planet" class="Image_Float-Left" border="1" width="250" />With the explosion of information, the volatility of energy markets, and the economic uncertainties brought on by recession, no one is immune to the global trends that are disrupting and transforming the fabric of business. People, companies, organizations, nations, and economies are becoming more instrumented, interconnected, and intelligent, forcing us to think and act in new ways to make our systems more efficient, productive, and responsive &mdash; in short, to build a smarter planet.
<P>
<h3>Information Overflow</h3>
<P>

One of the key challenges facing enterprises today is missed opportunity stemming from information raging out of control. Enterprises are handling more information than ever, and they're struggling to keep pace. Too little of the information that's being created in real time is being effectively captured, managed, analyzed, and made available to people who need it. </p>
<P>

Yet within this information explosion, something meaningful is happening. The world is changing, and information is at the heart of this revolution, enabling things that weren't possible even a few, short years ago. Today, companies are learning to harness the power of the three major trends that are defining our current information age:</p>
<ul>
  <li><strong>Instrumentation. </strong>The transistor is the building block of the digital age. Could you imagine a world where there would be one billion transistors per human? That world will be a reality sometime in 2010. Remember how you thought of your first cell phone as your "emergency" or "pizza-ordering from the car" phone? There are currently four billion mobile phone subscribers. Within two years, 30 billion Radio Frequency Identification (RFID) tags will have been produced globally. </li>
  <li><strong>Interconnection. </strong>Nearly two billion people use the Internet. Once a static environment for surfing, researching, and emailing, the Internet today lets users "speak" to each other, producing trillions of connections between people and "intelligent" objects.</li>
  <li><strong>Intelligence. </strong>Every day, 15 petabytes of new information are generated &mdash; eight times more than the information in all the libraries in the United States. With information exploding this way, new computing models are needed to manage the volume, and advanced analytics are required to produce predictive capabilities that yield better results. </li>
</ul>
<h3>Getting Smarter</h3>
<P>

To use these new capabilities to become smarter, organizations need to do three things:</p>
<P>

<strong>Focus on value. </strong>It's a familiar mantra: Do more with less. Yet, even as available capital shrinks, all organizations must remain flexible. That's why a laser focus on core businesses and initiatives is so important. The time to realign relationships by examining the financial solidity of suppliers, partners, and customers is now.</p>
<P>

<strong>Exploit opportunities. </strong>Smart businesses and organizations are looking to capture market share by disrupting weak competitors and acquiring strategically. At the same time, they're building future capabilities, protecting and acquiring talent, and trying to change the industry with bold moves and global positioning.</p>
<P>

<strong>Act quickly. </strong>Change can be disruptive. Those who manage change successfully do it by clearly communicating simple goals and seeking and leveraging experience. They establish leadership by getting the information to take action and by setting the agenda. And they handle risk and transparency through business performance management, analytics, and risk management.</p>
<P>
<div class="Article_Sidebar_Float-Right" id="Article_Sidebar"> 
<h3>Building Blocks for a Smarter Planet</h3>
<P>

Smarter Planet is a movement&mdash;a global initiative being carried out in every part of IBM and quickly moving beyond. It has already been featured in newspaper articles and editorials, commercials, and meetings between corporate representatives and heads of state, including CEO Sam Palmisano and U.S. President Barack Obama. </p>
<P>

Smarter Planet is built on four major components, each of which addresses a critical question:</p>
<ul>
  <li><strong>New intelligence. </strong>Data is exploding but is trapped in silos. How can we take advantage of the wealth of information available in real time from a multitude of sources to make more intelligent choices?</li>
  <li><strong>Smart work.</strong> We need new business processes to meet new demands. How can we work smarter while supported by flexible and dynamic processes modeled for the way people buy, live, and work?</li>
  <li><strong>Dynamic infrastructure.</strong> Our infrastructure is inflexible and costly. How do we create an infrastructure that drives down cost, is intelligent and secure, and is as dynamic as the business climate?</li>
  <li><strong>Green and beyond.</strong> Our resources are limited. How do we drive greater efficiencies, compete more effectively, and respond more rapidly by taking action now on conserving energy, protecting the environment, and achieving sustainability? </li>
</ul>
<P>

Learn more about the Smarter Planet initiative at ibm.com/innovation.</p></div>
<P>

<P>
<h3>Data Management and Beyond</h3>
<P>

Today, many businesses understand the need to drive greater efficiencies by taking action on energy, the environment, and sustainability. Energy costs are rapidly increasing: Application workloads are doubling every two years, and new environmental regulatory mandates are affecting many business and institutional are nas. Meanwhile, the unprecedented explosion of information is accompanied by information processing requirements; paper-based processes are too expensive and too slow. Companies are struggling to manage, track, and retrieve information for regulatory compliance, for insight into energy-related metrics, and for many business purposes. </p>
<P>

"Going green" is not merely altruistic; meeting information responsibilities and caring for the planet are not mutually exclusive. Efficient data management is the answer. Data compression, paper reduction, and intelligent archiving can reduce energy requirements and optimize resource utilization, addressing green initiatives while also creating a competitive advantage.</p>
<P>

Informix Dynamic Server (IDS) and DB2 have always been central to IBM's "Green and Beyond" vision (see sidebar). The following stories from IBM customers demonstrate the crucial role of data management in green initiatives. </p>
<P>

				
					<h3>Going Green with Informix</h3>
<P>

Fans of the 1939 movie classic <em>The Wizard of Oz</em> know Dorothy's home state, but perhaps her journey should have taken her to a different Emerald City: Lenexa, Kan. In this city, home to the Informix lab, developers, DBAs, and data architects are consistently improving the database known for its availability by rolling out enhancements that help its users create economically and environmentally sound solutions. </p>
<P>

Konkan Railway, which transports people and goods between Mumbai and Mangalore, is the youngest and hippest railway system in India. For 10 years, the company has used IBM Informix Dynamic Server (IDS) to streamline services, increase availability, and improve performance. Since introducing IDS to its Railway Application Package, Konkan has been able to assess information more efficiently and cultivate better decision-making. This, in turn, has led to 20 percent lower energy costs and a reduction of passenger delays throughout its service area. According to Vijay Devnath, IT manager at Konkan Railways, the company needs only three DBAs to manage more than 66 database instances, which keeps costs down while supporting rapid growth.</p>
<P>
<h3>Going Green with DB2 </h3>
<P>

DB2's deep compression capabilities support the green movement by reducing the resources needed to store information. SunTrust Bank, one of the largest banking institutions in the eastern United States, was experiencing low return on investment for its IT infrastructure. The bank needed to improve memory, storage, and server capability for its database environment while managing risk and compliance and increasing productivity. SunTrust migrated its IBM System p servers from DB2 8 to DB2 9 to take advantage of deep compression. As a result, the bank sees compression rates as high as 83 percent, saving the bank an immediate $2 million and providing $500,000 per year in ongoing savings. </p>
<P>

One of the key project components was the ability to install multiple DB2 versions and fix packs on the same computer. Additional benefits from table partitioning led to increased query performance due to partition elimination. The Self Tuning Memory Manager (STMM) senses the underlying workload and tunes memory settings based on need. </p>
<P>
<h3>Gold Consultants Take the Green Medal</h3>
<P>

IBM's Gold Consultants are helping to build a smarter planet by using IBM database products to create greener information technology. </p>
<P>

<a href="http://www.ibmdatabasemag.com/blog/main/archives/db2_for_luw_performance/index.html">Scott Hayes</a>, a popular blogger for <a href="http://ibmdatabasemag.com">ibmdatabasemag.com</a>, an IBM Gold Consultant, an IBM Data Champion, and president and CEO of Database-Brothers Inc., is always looking for ways to improve business performance. Recently, he discovered that he could do this while lowering energy costs at the same time. </p>
<P>

In a recent YouTube video, "DB2 Performance Management for a Smarter Planet GREEN IT World," Scott demonstrates the "ripple effects" of performance benefits by tuning an IBM DB2 9.5 database to its optimal level. As Hayes points out, lower CPU means lower energy costs. </p>
<P>

<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/dbt14n1_f1_youtube.jpg" width="400" height="286"></p>
<P>

<strong>Watch Scott Hayes of DBI show the "ripple effects" of <a href="http://tinyurl.com/agtmbq" target="_blank">tuning an IBM DB2 9.5 database</a></strong></p>
<P>

The IBM Gold Consultant program is a "mutual value" program with highly skilled consultants from around the world who have been instrumental in designing, implementing, optimizing, and migrating some of the largest database and transaction systems worldwide. In addition to their engagements with Fortune 2000 customers, many of the consultants teach courses, write articles and books, and present at conferences. IBM provides this highly regarded group with timely information on its products and strategies.</p>
<h3>Call to Action </h3>
<P>

Disruptions in the economic environment present transformative opportunities for smart businesses globally. The urgency for change is driving us to look at our planet in a new way, to find opportunities to become even more interconnected, instrumented, and intelligent. IBM and its customers continue to demonstrate what's possible on a smarter planet. </p>
<P>

</p>
<hr width="60%" />
<P>

<P>
<em>Christy Maver is a marketing and communications specialist in IBM's Information Management group. She has been with the company for eight years and currently focuses on messaging and strategy for its Smarter Planet initiative.</em></p>
<P>

<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/dbt14n1_f1_smartworks.jpg" width="400" height="695"></p>

				]]></body>
		</item>
	
		<item>
			<title><![CDATA[Optimizing Business with XML]]></title>
			<link><![CDATA[http://ibmdatabasemag.com/story/showArticle.jhtml?articleID=216300305&cid=RSSfeed]]></link>
			<description><![CDATA[Those who master the new business language reap the benefits of simpler IT environments, easier data integration, and a more flexible infrastructure. ]]></description>
			<pubDate>Tue, 31 Mar 2009 17:00:15 EDT</pubDate>
			<keywords><![CDATA[Native XML, pureXML]]></keywords>
			<blurb><![CDATA[Those who master the new business language reap the benefits of simpler IT environments, easier data integration, and a more flexible infrastructure. ]]></blurb>
			<authors><![CDATA[Scott Bisang]]></authors>
			<body><![CDATA[
			
					
<P>
<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/feature_image2sm.jpg" alt="Optimizing Business with XML" class="Image_Float-Left" border="1" width="250" />Since the introduction of native XML support in major database management systems, the amount of business information cast in this popular format has skyrocketed. The reasons are clear: XML is unrivaled both as a data exchange format for industry standards and as a data interchange format for application developers. Most industries are developing standards for data interchange in XML, and some XML standards, like Extensible Business Reporting Language (XBRL) for the reporting of business and financial data, transcend all industries. </p>
<P>

Yet XML's remarkable decade of growth nearly stalled due to the initial difficulty of storing and managing information in an XML format. As companies developed applications to store or retrieve XML data, they often used their existing infrastructures, including file systems and relational databases. These systems weren't designed to handle XML, and the resulting workarounds required expensive transformations with significant processing overhead. Some tried XML-only databases, only to find that segregating XML data in its own repository created yet another system that had to be separately maintained.</p>
<P>

IBM addressed these challenges by building native XML support into DB2, allowing XML to be stored and managed in its native format together with data in relational format. This XML capability, called DB2 pureXML, shortens development time, lowers maintenance costs, and dramatically improves application performance when storing and retrieving data.</p>
<h3>The Benefits of Native XML</h3>
<P>

There are many reasons why XML data should be stored in its native form, as the following customer scenarios show. </p>
<P>

<strong>Simplifying the IT environment.</strong> Using DB2 pureXML, Storebrand Group, a Norwegian financial services company, reduced the lines of code for writing to and reading from its database by 65 percent. With less code to develop, test, and maintain, developers are freed to work on more productive tasks. In addition, Storebrand found that schema changes are now easier. In the past, adding just a single field took a day of work (development and testing) and a week to implement because of the processes involved with database changes. Now, developers can simply change the pointer to the schema in a DB2 XML configuration file, which takes about five minutes.</p>
<P>

Schema changes previously required so much effort due to the nature of the workarounds created to handle XML before native storage options existed. One method, called "shredding" or "decomposition," maps XML into a tabular format. Another approach puts XML data into a single large object (LOB) cell in a table. Both approaches work, but there are significant drawbacks, especially as the amount of XML data grows. </p>
<P>

Shredding is a popular approach for quickly retrieving individual pieces of information from the database. However, this fast query performance comes at a cost in terms of the effort required to map the XML data into a table and the processing overhead associated with inserting information into the database. This cost increases if the original XML data has to be recreated from the shredded fields.</p>
<P>

Before XML can be shredded, a relational schema must be designed. This process can be labor-intensive, although it can be partially automated with off-the-shelf tools. However, the resulting tables will need to be carefully examined and optimized. After designing the relational schema, the environment that actually maps the XML to the relational schema must be set up. Then comes the development and testing of code for using the data (and this code is typically complex because it requires unwieldy SQL statements with multiple JOINs). </p>
<P>

The significant overhead for shredding XML data into a relational schema is only a part of the story. XML schema changes are a fact of life, and they can play havoc with relational schemas, mapping processes, and application code. That's why many organizations realize such gains from adopting DB2 pureXML storage. </p>
<P>

<P>
As a practical example, consider the Financial products Markup Language (FpML) industry standard protocol for complex financial products. With DB2 pureXML, dealing with FpML messages is straightforward: Just store the complete message in an XML column in a single table. However, in some implementations, using shredding to store FpML messages could require working with more than 475 separate database tables. Maintaining 475 tables is significantly more complex than managing just one. </p>
<P>

<strong>Boosting IT productivity. </strong>The UCLA Medical Center uses DB2 pureXML to manage patient medical records, diagnostic images, and even handwritten doctor's notes. Hospital employees insert the information into the Patient Oriented Document System, which allows doctors quick access to the information to ensure high-quality patient care. By using DB2 pureXML, UCLA realizes significant productivity improvements: The time required for certain IT projects is reduced from weeks to hours.</p>
<P>

One of the advantages of working with a DB2 pureXML repository is that the data doesn't require any special treatment or transformations before storage or retrieval. XML is stored directly in the repository and retrieved from the same location. This simplified way of working with XML data reduces the time needed for many common tasks. Such time savings are increasingly valuable in today's IT environments.</p>
<P>
<div class="Article_Sidebar_Float-Right" id="Article_Sidebar"> 
<h2>The Complete XML Toolbox</h2>
<P>

DB2 pureXML is a great advance for XML data persistence. Thanks to pureXML, many organizations are reporting significant gains in both performance and productivity.</p>
<P>

However, when it comes to storing XML data, pureXML is just one tool in your toolbox. DB2 pureXML is probably the most important tool, but there are occasions when alternative approaches to storing XML in DB2 may prove better. Sometimes it is better to store XML data in a CLOB or BLOB. And sometimes it's better to use shredding. </p>
<P>

IBM's Conor O'Mahoney, the creative force behind the Native XML Databases blog, explains <a href="/story/showArticle.jhtml?articleID=216403288">how to make sense of the various XML options and when to use each</a>. </p></div>
<P>

<P>
<strong>Improving information integration. </strong>China Huadian Corp. (CHD), which manages more than 100 utility and financial services companies, created a flexible data analysis and reporting system built on DB2 9. The system manages data from diverse facilities, adjusting easily to new data reporting and delivery requirements, new types of industrial facilities, and the removal or addition of assets. Report data from each of CHD's branches is stored in DB2 pureXML. This setup accommodates various schemas and report formats while also making the data accessible to a variety of organizations, which improves communication between business and IT staff. DB2 pureXML makes it easy to add, update, or delete reported items. Detailed production and cost information is integrated and displayed using the company's analysis and reporting applications. </p>
<P>

Integrating information among various business units and locations can be difficult and frustrating. A flexible, automated, <br />
  and scalable platform for information reporting makes the process much easier to manage. By building a better information reporting system, CHD gained greater business insight and agility. It also reduced cost and labor for implementing application changes and improved responses to changes in regulatory and management reporting requirements. </p>
<P>

<strong>Supporting XML schema flexibility and evolution. </strong>One additional area to consider when evaluating XML storage options pertains to XML schemas, which define the structure of XML data. An XML schema describes the XML elements and attributes that can appear in the data, where they can appear, and how often. Validating the XML data means making sure that the XML data adheres to the rules set out in the XML schema. </p>
<P>

XML schemas define an agreed-upon vocabulary of XML tags for a specific application scenario (such as financial trading, medical records, or insurance claims). But that vocabulary and the applications scenarios it supports can change over time, so schema flexibility and schema evolution are important. To explain the need for these features, let's look at a couple of scenarios.</p>
<P>

Many tax authorities in the United States store information from tax forms in XML format. Tax forms change nearly every year, and changes to tax forms mean changes to the XML schema. Yet the new schema won't necessarily be the correct one for all records. Instead, records should be validated against the schema that was in existence when the record was created. In this case, schema flexibility &mdash; the ability to cater to a wide range of XML schema needs &mdash; is a key requirement. The application must have the option to validate the cells in a database column against different schemas (or to not validate against any schema).</p>
<P>

Another common scenario involves storing messages that adhere to one of the major XML industry standards (the healthcare standard HL7 or FpML, for example). Industry standards are continually evolving; moving to a new version of an XML standard usually means a new XML schema. </p>
<P>

				
					
<P>
</p>
<P>
Migrating to a new version of an XML standard is easier if the database management system enables the migration to this new XML schema without revalidating or changing the entire existing XML document. For instance, if data must adhere to the new XML standard, then the database management system should ensure that existing data (originally validated against an XML schema for an older version of the standard) adheres to the new XML schema. This data server feature is called compatible schema evolution. &#91;Note: Incompatible schema evolution is also possible.&#93; DB2 pureXML makes it especially easy to handle XML schema flexibility and XML schema evolution. </p>
<P>
<h3>The New Business Language</h3>
<P>

Although once considered difficult to handle, XML has become a key element of business processes, and the volume of XML data increases every day. DB2 pureXML preserves the XML characteristics that developers love, while allowing them to avoid the clumsy, complicated techniques that once limited its potential. The result is a more flexible, efficient IT environment that can quickly adjust to changing business needs and opportunities. </p>
<hr width="60%" />
<em>Scott Bisang is a marketing communications specialist in IBM's Information Management group. </em>
<P>
<div class="Article_Sidebar_Larger">
<h2>Taming a TeraByte of XML Data</h2>
<P>

XML has emerged as a de facto standard for data exchange, service-oriented architecture (SOA), and <br />
  message-based transaction processing. As companies accumulate increasing amounts of XML data, they require efficient ways to store, index, query, update, and validate XML documents. DB2 pureXML provides the capabilities to address this challenge. Executing the industry's first terabyte XML database benchmark, Intel, and IBM have joined forces to demonstrate how large-scale XML processing is feasible with high-end performance.</p>
<h3>The Benchmark</h3>
<P>

TPoX (Transaction Processing over XML, http://tpox.sourceforge.net) is an open-source and application-level XML database benchmark that simulates a financial application scenario. Customer information, accounts, and orders to buy and sell securities in an online brokerage system are represented in XML format. The 1TB scale factor of the benchmark comprises 360 million XML documents. In the OLTP workload, 200 concurrent users execute a mix of 70 percent SQL/XML queries and 30 percent insert, update, and delete operations &mdash; without think time.</p>
<P>

The test system is an Intel Xeon Processor 7400 Series server, running 64-bit Linux and DB2 9.5 Fixpack 2. The system has four CPUs with six cores per CPU and a clock speed of 2.67 GHz. With a main memory of 64GB and 135 disks, the system is a well-balanced configuration for the target workload. The DB2 database used a 16K page size, automatic storage, and self-tuning for all memory and configuration parameters that can be set to automatic. XML inlining and compression was critical to reduce the storage footprint of the 1TB raw XML data. All system configurations and tests were performed by Intel at Intel Labs.</p>
<P>
<h3>The Score</h3>
<P>

After populating the database, DB2 compression reduced the 1TB of raw data by 64 percent. This compression was essential to avoid I/O bottlenecks and achieve adequate buffer pool hit ratios. On this database, the mixed TPoX workload reached a peak throughput of 6,763 transactions per second. Additional tests examined the incremental insert performance. Adding more XML documents to the populated data showed an ingestion rate of ~100GB/hour and up to 11,900 XML inserts per second.</p>
<P>

<a href="/story/showArticle.jhtml?articleID=216500898">Read the full article</a> by Agustin Gonzalez of Intel Corp. and Matthias Nicola of IBM Silicon Valley Lab for complete details on the system and database configuration, additional scalability results, charts, and valuable lessons learned.</p>
<P>

<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/dbt14n1_f2_sidebar.jpg" width="400" height="257"></p>
<P>

Intel&reg; Xeon&reg; Processor 7400 Series is a trademark of Intel Corporation in the U.S. and other countries. All performance results are Intel internally measured as of Sept. 2, 2008. Different systems or configurations may produce different results.</p>
<P>

Other names and brands may be claimed as the property of others.</p></div>
<P>
<div class="Article_Sidebar_Larger">
<h2>XML and the DBA</h2>
<P>

XML use is growing thanks to its role in industry and technology standards and in modeling internal business data. DBAs need to be prepared to handle XML data in their environment. </p>
<P>

DB2 9 pureXML makes it efficient to store, manage, and query XML in a DB2 database. DBAs can include XML data in a DB2 database using their existing DBA tools and skills. In fact, choosing XML as the storage format in addition to a transaction format can make a DBA's life much simpler, because the inevitable data structure changes won't require changes to the database. The New York State Department of Taxation benefited from this approach when it implemented an XML-based solution using DB2 pureXML for tax process automation.</p>
<P>

DBAs can extend their sphere of influence, learn new skills, and enhance their value by learning more about XML. When using pureXML, DBAs can largely rely on their existing knowledge. But by learning a bit more about DB2 pureXML, they can contribute to application architecture discussions, such as the benefits of end-to-end XML processes, the trade-offs of keeping XML intact versus "shredding," and methods for using XML schemas for validating individual documents. DB2 pureXML has capabilities that can complement these choices for efficiently managing XML data in a dynamic enterprise. </p>
<P>

<a href="http://www.ibmdatabasemag.com/story/showArticle.jhtml?articleID=216402743">Read this online-only article </a>by Bryan Patterson and Dexiong Terry Zhang to learn: </p>
<ul>
  <li>How storing XML in the database will affect the DBA</li>
  <li>How DB2 pureXML manages XML </li>
  <li>How to use existing DBA skills with DB2 pureXML. </li>
</ul></div>
				]]></body>
		</item>
	
		<item>
			<title><![CDATA[Protecting Test-Data Privacy]]></title>
			<link><![CDATA[http://ibmdatabasemag.com/story/showArticle.jhtml?articleID=216300321&cid=RSSfeed]]></link>
			<description><![CDATA[Security and access procedures in place for production environments don't always translate to test environments. But the same government regulations and industry requirements still apply.]]></description>
			<pubDate>Tue, 31 Mar 2009 17:00:15 EDT</pubDate>
			<keywords><![CDATA[Optim Data Privacy, Data Governance, Privacy Regulations, Data Encryption, Data Masking]]></keywords>
			<blurb><![CDATA[Security and access procedures in place for production environments don't always translate to test environments. But the same government regulations and industry requirements still apply.]]></blurb>
			<authors><![CDATA[Suzanne Schroeder]]></authors>
			<body><![CDATA[
			
					
<P>
<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/feature_image3sm.jpg" alt="Protecting Test-Data Privacy" class="Image_Float-Left" border="1" width="250" />Living in the information age has become both a blessing and a curse. People have access to every kind of information from almost any location. As a result, it's easy to pay bills, access account information, and make purchases without leaving the comfort of the living room. </p>
<P>

<P>
Unfortunately, all that information, which has to be collected, managed, and stored, has become a lucrative target for thieves. Ensuring that confidential data remains<br /><br /> protected and secure across the enterprise has become a critical business issue. </p>
<P>

Yet news headlines about data privacy breaches have become almost as common as sports scores. Consider what happened at Certegy Check Services Inc., a subsidiary of Fidelity National Information Services Inc.: In 2007, a 2.3 million-record data breach at the company made headlines when a DBA stole banking and credit card information, then sold it to a broker. The broker subsequently sold some of the data to direct marketing organizations for solicitation. (See the article "<a href="http://searchsecurity.techtarget.com/news/article/0,289142,sid14_gci1263233,00.html" target="_blank">Malicious insider sells Fidelity National customer data</a>" for details.)</p>
<h3>The Need for Privacy Protection</h3>
<P>

Data thieves have been known to acquire backup tapes, disk drives, user IDs, and passwords. Hackers take advantage of vulnerable network or infrastructure security or poor server or database security standards. A 2009 article in <em>Baseline Magazine</em> suggests that disgruntled employees experiencing layoffs and personal financial woes may be tempted to turn to IT crimes to mitigate their losses. Whatever the motivation of the perpetrator, data breaches continue to occur. According to the Privacy Rights Clearinghouse, as of February 2009, the total number of records containing sensitive personal information involved in security breaches in the U.S. since 2005 has surpassed 253 million. </p>
<P>

But protecting information against thieves and hackers is only half the battle. As government regulations and industry requirements around the globe become more stringent, remaining compliant with privacy laws and industry regulations gets harder. Table 1 shows some of the data privacy laws and regulations enacted over the past ten years. </p>
<P>

<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/dbt14n1_f3_tab1.jpg" width="400" height="137"></p>
<P>

<strong>Table 1. Some industry regulations and standards in place today.</strong></p>
<P>

<P>
Noncompliance can result in myriad problems, including legal fees and monetary fines, jail time for executives, brand damage, and customer loss. In the midst of unprecedented numbers of security breaches, protecting privacy makes good business sense.</p>
<h3>What Data to Protect</h3>
<P>

Typically, there are three main types of data that organizations must be careful to safeguard:</p>
<ul>
  <li><strong>Customer data,</strong> including client names, addresses, account numbers, and payment card information</li>
  <li><strong>Employee data,</strong> including employee names, Social Security numbers, email addresses, and telephone numbers</li>
  <li><strong>Trade secrets,</strong> including financial data, cost of goods sold (inventory lists), and information about new product updates. </li>
</ul>
<P>

All this information is managed in various production databases and other company repositories. Most production environments have established security and access restrictions to protect against breaches. Standard security measures can be applied at the network level, application level, and database level. Organizations are even applying application development best practices as part of the daily process to ensure that application code is written to be more secure. However, these protective measures can't be replicated across every environment because the methods that protect data in production may not meet the unique requirements for protecting nonproduction (testing, development, and training) environments.</p>
<h3>Where to Protect Data</h3>
<P>

The standard methods for protecting privacy in production environments may not be as effective when applied to data in nonproduction environments, in which developers, testers, and trainers require access to realistic data. A December 2007 report from the Ponemon Institute found that 62 percent of companies surveyed used actual customer data to test applications during the development process. The report, called "The Insecurity of Test Data: The Unseen Crisis," also found that 50 percent of respondents had no way of knowing whether the data used in testing had been compromised. Fifty-two percent of respondents outsourced application testing and 49 percent shared live data. Finally, 26 percent of respondents said they did not know who was responsible for securing test data.</p>
<P>

Nonproduction environments can pose a tremendous threat to an organization and must be treated with the highest levels of security, with as much importance as a system in production. Standard nondisclosure agreements may not deter a disgruntled employee from taking a database full of actual Social Security numbers. An application developer's laptop containing a test database could be lost or stolen. Cloned databases sent to outsourcers for testing activities could be easily intercepted. </p>
<P>

Let's examine a simple, hypothetical testing scenario. XYZ retailer wants to improve its online ordering system to improve customer service and increase customer retention. The application improvements require testing and development activities, so copies, or clones, of production data are used to create more realistic and accurate testing conditions. Because these clones are taken directly from the production environment, they contain personally identifiable information such as names, addresses, payment card numbers, so on. Once cloned, this information is then propagated from a secure production environment to a more vulnerable nonproduction environment.</p>
<P>

So why not just test in the production environment and save time and resources? Application data needs to be tested outside the production environment so that errors that occur during testing won't affect the live production system. Sometimes multiple testing environments are required, putting multiple copies of production information at risk. </p>
<P>

But test data need not contain personally identifiable information &mdash; it just needs to be realistic. Some companies believe that encrypted data will suffice. But encryption is not always adequate in nonproduction environments.</p>
<P>

				
					<h3>Why Encryption Isn't Enough</h3>
<P>

Encryption disguises data and converts it into an "encoded" format for privacy protection within a database. The data is then decrypted back to its original state for viewing in user interfaces, reports, and other development and testing activities. More often than not, developers and testers will see the decrypted data when viewing application screens, inputting data, running reports, and when performing developing and testing activities. Although encryption can provide a sufficient "blanket" form of protection if data is stolen directly from a database, it may not protect against misappropriation once the data is decrypted (for instance, if users make copies of the data). When data is exported out of a database and into a spreadsheet or other file format, the encryption is no longer valid, and the data is at risk. If data can be seen, it can be copied.</p>
<h3>Safeguarding Data in Nonproduction Environments</h3>
<P>

Ensuring data is protected in the event it falls into the wrong hands is the best solution. In <em>Beyond Fear: Thinking Sensibly About Security in an Uncertain World</em>, author Bruce Schneier writes, "We're not going to solve this by making data hard to steal. The way we're going to solve it is by making the data hard to use." </p>
<P>

De-identifying or masking is one way to ensure that stolen, exposed, or lost data will be of no use to anyone. In a nonproduction environment, data de-identification is the process of systematically removing, masking, or transforming confidential data elements that could identify an individual or that should otherwise not be made public. Data masking enables developers, testers, and trainers to use realistic data and produce valid results, while still complying with privacy protection rules. </p>
<P>
<h3>Privacy Protection Strategies</h3>
<P>

Most organizations already have a formalized application development life-cycle process in place. More organizations are now realizing the need to create data management or data governance strategies as well. Having a well-defined set of practices to protect data as it moves through its life cycle can help ensure that it remains protected in development, testing, and training activities. </p>
<P>

IBM's Integrated Data Management solutions offer companies a way to design, develop, deploy, operate, optimize, and govern enterprise data throughout its life cycle, from requirements to retirement. A comprehensive data privacy solution enables companies to efficiently and effectively meet data privacy challenges.</p>
<P>

<P>
Specifically, the IBM Optim Data Privacy Solution provides:</p>
<ul>
  <li>De-identification capabilities that mask confidential application data with realistic but fictional data</li>
  <li>Application-aware masking capabilities to ensure that masked data resembles the structure and characteristics of the original information</li>
  <li>Context-aware, prepackaged data masking routines that make it easy to de-identify data elements, such as payment card numbers, Social Security numbers and email addresses</li>
  <li>Persistent masking capabilities that propagate masked replacement values consistently across applications, databases, operating systems and hardware platforms</li>
  <li>Referential integrity of masked data to ensure successful testing and development</li>
  <li>Help to maintain compliance with national and global data privacy regulations and requirements</li>
  <li>Simple implementation and use.</li>
</ul>
<P>

With Optim, companies can de-identify data in a way that is valid for use in development, testing, and training environments, while protecting data privacy. Figure 1 shows sample production data before and after masking with Optim.</p>
<P>

<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/dbt14n1_f3_fig1.jpg" width="400" height="172"></p>
<P>

<strong>Figure 1. Optim's intelligent masking capabilities help protect sensitive information, like credit card numbers and Social Security numbers, from being exposed in nonproduction.</strong></p>
<P>
<h3>A Case for Data Masking</h3>
<P>

Let's take a look at an example of how Optim is used in the real world. A one billion-dollar financial technology firm that uses IBM Optim Data Privacy Solution offers products and services that drive account processing, electronic funds transfer, consumer healthcare payments and more. To support its business operations, the company developed an end-to-end electronic payment application to manage daily transaction activities. Serving thousands of clients across industries, this innovative application processes millions of payments each month.</p>
<P>

With increasing regulatory pressures, and knowing that the trust and loyalty of its customers depends on protecting sensitive data effectively, senior management adopted a proactive leadership position on data privacy protection. The company developed enterprise policies for classifying consumer information, based on Federal Deposit Insurance Corporation's (FDIC) definitions of personally identifiable financial information. Specifics included information collected during the application process, information acquired from a financial product or service transaction, or information obtained from a third party in connection with providing a financial product or service.</p>
<P>
<div class="Article_Sidebar_Float-Right" id="Article_Sidebar"> 
<h3>The IBM Optim Data Privacy Solution</h3>
<P>

The IBM Optim Data Privacy Solution provides a set of data masking techniques to support data privacy compliance requirements:</p>
<P>

<strong>Comprehensive data masking techniques. </strong>Optim provides a variety of masking options, including simple techniques that mask character or numeric data or generate random or sequential numbers, and more advanced masking routines that support complex data privacy requirements.</p>
  
<P>
<strong>Support for application logic. </strong>Optim's data masking techniques respect the application logic and make sense to the person viewing the results. In other words, the masked data resembles the original information. Numeric fields retain the appropriate structure and pattern and must remain within a range of permissible values, so that functional tests pass all application validity checks. </p>
  
<P>
<strong>Support for business context data elements. </strong>Data masking with Optim respects the business context of specific data elements. For example, prepackaged capabilities accurately mask Social Security numbers, credit card numbers, and email addresses.</p>
  
<P>
<strong>Capabilities that preserve the data integrity. </strong>Optim automatically masks and propagates masked data elements accurately across related tables, as well as applications, databases, operating systems, and hardware platforms, to ensure valid test results.</p>
  
<P>
<strong>Mask-and-move or mask-in-place. </strong>Optim makes it possible to extract and mask data and then insert or load the data into one or more destination nonproduction databases. In addition, mask-in-place capabilities enable the de-identification of data extracted using third-party tools and of data that already resides in cloned nonproduction environments. These options provide flexibility for organizations that have data in place for testing or that use backup facilities to create those test databases. Masking data directly where it resides eliminates the need to move data for additional processing and still preserves the referential integrity of the data.</p>
</div>
<P>

<P>
The IT group considered the data classification and retention programs in place and applied encryption techniques to protect data on laptops and BlackBerry devices. Primary security for production data residing on servers was managed using encryption, access controls, and the network infrastructure. However, the development and testing environments presented unique challenges. Simply replicating production safeguards for these environments would not be sufficient.</p>
<P>

Supporting privacy compliance would require removing, masking, or transforming elements that could be used to identify an individual. De-identified data would be acceptable to use in open testing environments. Masking techniques would have to propagate de-identified data accurately, while preserving the referential integrity to support reliable testing.</p>
<P>

Optim provided a variety of proven masking capabilities for de-identifying the firm's data. Using substrings, random or sequential numbers, arithmetic expressions, date aging, and other techniques, Optim substituted customer data with contextually accurate but fictionalized data to produce accurate test results. Optim was scalable across applications, databases, operating environments, and hardware platforms.</p>
<P>

Optim's capabilities satisfied requirements to protect customer information in the development and testing environments. The consistent approach for managing test data improved operational efficiencies to lower costs. The ability to secure confidential data helped reduce legal risks that would have resulted in financial penalties. In addition, the company was able to maintain its strategic business advantage by engendering customer trust resulting in increased revenue opportunities.</p>
<h3>The Bottom Line</h3>
<P>

The need for privacy protection and confidentiality spans industries and global boundaries. Although protecting data is a necessary step in production and nonproduction environments alike, the same privacy protection measures can't be used in both areas. Methods for protecting privacy in production environments simply don't support the needs of the development, testing, quality assurance, and training teams for realistic data. </p>
<P>

As companies start to shift their attention to protecting data in nonproduction environments, they're realizing that de-identification is a best practice for masking sensitive data and protecting privacy. The IBM Optim Data Privacy Solution offers companies a method to de-identify data in a way that is valid for use in nonproduction environments, while still protecting data privacy. Implementing Optim helps organizations comply with data privacy regulations and protect the confidentiality of sensitive information across the enterprise &mdash; and helps keep companies out of the headlines. </p>
<hr width="60%"/>
<P>

<em><strong>Suzanne Schroeder</strong> is an Optim marketing communications writer in the IBM Software Group.</em></p>
<P>

				]]></body>
		</item>
	
		<item>
			<title><![CDATA[Data Architect: DB2 Data Warehouse Performance, Part 1]]></title>
			<link><![CDATA[http://ibmdatabasemag.com/story/showArticle.jhtml?articleID=216300344&cid=RSSfeed]]></link>
			<description><![CDATA[Translate OLTP-tuning skills into effective performance management for DB2-based business intelligence systems.]]></description>
			<pubDate>Tue, 31 Mar 2009 17:00:13 EDT</pubDate>
			<keywords><![CDATA[DB2 for z/OS, OLTP Performance Tuning, Data Warehouse Tuning, Robert Catterall, Data Architect, , DB2 DBA]]></keywords>
			<blurb><![CDATA[Translate OLTP-tuning skills into effective performance management for DB2-based business intelligence systems.]]></blurb>
			<authors><![CDATA[Robert Catterall]]></authors>
			<body><![CDATA[
			
					
<P>
<img src="http://i.cmpnet.com/v2.db2mag.com/columns/catterall_robert.jpg" alt="Robert Catterall" width="90" height="90" border="1" class="Image_Float-Left">
During a data warehousing presentation at the <a href="http://idug.org" target="_blank">International DB2 Users Group</a> conference in Warsaw last October, IBM's Willie Favero showed text from the 1983 announcement letter that introduced the DB2 database management system. That announcement positioned DB2 as an excellent foundation for decision-support applications (this was before the term "data warehouse" had been coined).</p>
<P>

I well remember the DB2 launch, because I was working for IBM at the time. DB2 was indeed a great choice for business intelligence (BI) systems, but its programming productivity benefits &mdash; an English-like data manipulation language and DBMS-determined data access path selection &mdash; were equally appealing to IT people who were responsible for building and maintaining operationally focused applications. IBM responded to the voice of the customer by delivering a slew of performance enhancements over several DB2 releases that enabled dramatic increases in application throughput. DB2 &mdash; on Linux, Unix, and Windows (LUW) servers and on mainframes &mdash; became hugely popular as a data server for transaction processing workloads, with plenty of those being in the 1,000+ transactions-per-second category.</p>
<P>

These days, interest in data warehousing is rising across industries as companies seek to extract actionable intelligence from their data assets to drive better decision-making. Businesses and government entities that have long used DB2 for operational applications are now building DB2-based BI systems. That development has some DB2 online transaction processing (OLTP) veterans wondering if they can be equally effective in supporting a data warehouse, particularly when it comes to performance management. This article is for those professionals. The subject is broad in scope, so I'll cover it in two installments.</p>
<h3>Data Warehouse vs. OLTP Performance Management: Apples and Oranges</h3>
<P>

Performance management in a DB2 data warehouse environment isn't the same as monitoring and tuning a DB2-based OLTP application. Here are some of the key differences:</p>
<P>

<strong>Individual SQL statements vs. transactions.</strong> In an OLTP environment, the focus is on transactions that typically contain multiple SQL statements and are often expected to complete in less than a second. For a BI application, a "transaction" (a user interaction with the system) may involve the execution of just one SQL statement, and that statement may run for several minutes or even an hour or more without being thought of as "slow." A user might be very happy to get a report in an hour if it once took 10 hours to run. </p>
<P>

<strong>Facts and dimensions.</strong> A DB2 database used for OLTP work will likely have a traditional third-normal-form design (or something close to that). Data warehouse database designs, on the other hand, are often dimensional in nature, with sets of related tables arranged in "star schemas" (a central "fact" table and associated dimension tables).</p>
<P>

<strong>Continuous vs. overnight database updates. </strong>For an OLTP application, database updates tend to occur around the clock, seven days a week. On the BI side, despite growing interest in near real-time updating of database values, a data warehouse database is typically updated at night, often via massive extract, transform, load (ETL) runs. Query access is typically unavailable during ETL processing, making timely completion of the database update process an imperative.</p>
<P>

<strong>Small vs. large result sets. </strong><code>SELECT</code> statements in OLTP transaction programs typically retrieve just a few database rows (often only one or two). Data warehouse queries &mdash; particularly those used to generate reports or online analytic processing cubes &mdash; may return hundreds of thousands (or even millions) of rows.</p>
<P>

<strong>Complex vs. simple queries. </strong><code>SELECT</code> statements in OLTP transaction programs are often quite simple: one or two tables accessed, little or no dynamic table-building, and little or no on-the-fly transformation of data values or types. Queries associated with BI applications might be several pages long, with joins of a dozen or more tables, nested or common table expressions, recursive SQL, data value-changing <code>CASE</code> expressions, and data-type transformation via <code>CAST</code> specifications or scalar functions.</p>
<P>

<P>
<strong>SQL you have to deal with vs. SQL you wrote (or at least reviewed).</strong> In a data warehouse environment, SQL is often generated by reporting or OLAP tools, with no opportunity for you to change it before it's executed. It'll be up to you to set up a DB2 environment in which such queries can run well.</p>
<P>

Basically, a DB2 professional helps to deliver good data warehouse performance by getting two things right: </p>
<ul>
  <li>Setting up the DB2 environment so as to give queries the best chance of running well</li>
  <li> Effectively tuning queries that are running too long (in spite of your having done a good job with the first task). </li>
</ul>
<P>

This article focuses on setting up the DB2 environment. I'll deal with the tuning of data warehouse SQL statements in the next issue. </p>
<h3>Getting the DB2 Environment Right</h3>
<P>

Having a DB2 data warehouse environment that promotes good query performance involves both system-level and database-level actions. With respect to the DB2 system, pay attention to the following:</p>
<P>

<strong>Leverage 64-bit addressing.</strong> Big DB2 buffer pools are always helpful, but they are especially useful for I/O-intensive data warehouse workloads. Many experienced DB2 people who are accustomed to working within the confines of a 2GB (mainframe) or 4GB (Linux/Unix/Windows) memory space have been a little slow to get with the 64-bit program. </p>
<P>

Server memory sizes are humongous these days: You can get a terabyte or more of system memory on an IBM System z mainframe, System p server (AIX or Linux), or System x server (Windows or Linux). DB2 on any of those platforms will support a buffer-pool configuration size of at least a terabyte. </p>
<P>

If you have a server with a large memory resource, you need to think big in terms of DB2 buffer pools. I've seen DB2 running on a server with 40GB of system memory with an 800MB buffer pool configuration. That's way too small: 10-20GB (at least) would be more appropriate in that case. </p>

				
					
<P>
Keep in mind that as you increase the size of a buffer pool, you want to see a reduction in disk read I/O activity. You also may want to let DB2 handle buffer-pool sizing for you. Automatic memory management is already proving to be a popular feature of DB2 9 for LUW, and now DB2 9 for z/OS (in concert with the z/OS Workload Manager) can manage buffer-pool sizing via the <code>AUTOSIZE(YES)</code> option of the <code>ALTER BUFFERPOOL</code> command.</p>
<P>

<strong>Leverage query parallelism.</strong> DB2 can divide the work needed to process a query into pieces and execute those pieces in parallel, substantially reducing run time. Taking advantage of this feature requires enablement via parameter specifications. For DB2 for LUW, parallelism within a single server is enabled by setting the value of the <em>intra_parallel</em> database manager parameter to <code>YES</code>. On both the mainframe and LUW platforms, parallelism for a given dynamic <code>SELECT</code> statement (dynamic SQL tends to be the rule in a data warehouse environment) depends on the setting of the <code>CURRENT DEGREE</code> special register. If the value of <code>CURRENT DEGREE</code> (which applies to a given client-DB2 connection as opposed to being a system-wide value) is <code>1</code>, a query won't be parallelized; if the value is <code>ANY</code>, a query can be parallelized (if the optimizer determines that this would improve performance). DB2 determines the degree of parallelism. </p>
<P>

At many sites, the preferred approach is to set the default value of <code>CURRENT DEGREE</code> to <code>1</code> (no parallelism) and to change that to <code>ANY</code> for particular queries by using the SQL statement <code>SET CURRENT DEGREE</code>. If the use of <code>SET CURRENT DEGREE</code> is not an option (some query-generating tools might not support the use of this statement), the default value of the special register can be set to <code>ANY</code> via the <code>CDSSRDEF</code> ZPARM parameter for DB2 for z/OS, and via the <code>dft_degree</code> database parameter for DB2 for LUW. </p>
<P>

In addition to being able to split a query within a single server, DB2 can split a query across multiple servers in a cluster configuration through DB2 for z/OS sysplex query parallelism and the Data Partitioning Feature of DB2 for LUW. Keep in mind that DB2 for z/OS query parallelism is a great way to drive utilization of zIIP engines, specialized mainframe processors that can deliver computing cost savings to your organization.</p>
<P>
<h3>Database-Level Performance Boosters</h3>
<P>

A DB2 data warehouse environment is made still more query-friendly through these database-related actions:</p>
<P>

<strong>Range-partition large tables.</strong> Range partitioning, by which a table's rows are stored in several different physical files based on key ranges, has been around a long time on the mainframe DB2 platform and was made available for LUW servers with DB2 9. It's particularly important as a driver of query parallelism on mainframe servers. It also can boost query performance on LUW servers through a query optimization technique known as data partition elimination. </p>
<P>

There aren't any hard-and-fast rules with regard to identifying tables that should be range-partitioned, but a good starting point might be tables with 1 million or more rows. The selection of a partitioning key for a table (it can be single- or multi-column) will depend on your needs, but consider that a time-based key (a date column, for example) can make for a very efficient data-archival process. Note that range partitioning is different from the hash-partitioning algorithm that DB2 for LUW uses to distribute rows of a table across the nodes of a multiserver cluster when the Data Partitioning Feature is utilized.</p>
<P>

<strong>Keep rich, accurate catalog statistics.</strong> DB2 has, in my opinion, the best query optimizer on the market (IBM invented cost-based SQL statement optimization), but it has to be able to make informed access-path decisions to work well. The key input to query optimization is the statistical information in the DB2 catalog tables, so it's important that these statistics be kept current. The best way to do this is to run <code>RUNSTATS</code> (a command on DB2 for LUW systems, a utility in a DB2 for z/OS environment) on a regular basis. Note that DB2 9 for LUW has an automatic statistics collection feature. </p>
<P>

Rich catalog statistics are also important (the more DB2 knows about the data in the database, the more likely that queries will perform well), so collect what you can: statistics for indexes and for tables, and cardinality and distribution information for as many columns as practically possible. Remember, the more information <code>RUNSTATS</code> gathers, the more CPU time it consumes, so if you are tight on processing capacity you might need to restrict column-statistics generation to those columns that are used &mdash; or are expected to be used &mdash; in query predicates.</p>
<P>

<strong>Leverage indexing.</strong> People are often quite conservative when it comes to creating indexes on tables in a database that supports OLTP applications, and for good reason: Each new index defined on a table will make every <code>INSERT</code> and <code>DELETE</code> operation more expensive (<code>UPDATE</code>s also become more costly when updated columns become part of an index key). </p>
<P>

In a BI environment, there tends to be more of an emphasis on the performance of data-retrieval (versus data change) operations; therefore, it typically makes sense to have more indexes on tables in a data warehouse database than you'd have in an OLTP database. Still, you can go too far with indexing data warehouse tables, preventing the periodic ETL data-update process from completing in the time required (resulting in a delayed "opening" of the data warehouse for query purposes &mdash; something that can really upset users). </p>
<P>

In an OLTP environment, I generally like to limit the number of indexes per table to four or five, on average. When the database is part of a BI system, I'm more comfortable with as many as eight to 10 indexes on a table &mdash; but I wouldn't start out with that many. Instead, I'd like to come online initially with five or six indexes per table, in case I subsequently want to define additional indexes to boost the performance of particular queries that need run-time reduction.</p>
<P>

<strong>Use clustering wisely.</strong> Data clustering (the physical ordering of rows in tables) is a big deal in a data warehouse environment, because rows are usually retrieved in bunches (as opposed to the smaller result sets typical of OLTP systems). When you're going after a lot of rows, locality of reference (having the desired rows physically close to each other in a target table) can make a big difference in query run times. Give plenty of thought to what users are going to want &mdash; all rows for a given customer ID? For a given product? For a given date range? </p>
<P>

If two or more clustering keys make sense for a given table, by all means take advantage of the multidimensional clustering (MDC) feature of DB2 9 for LUW. Mainframe DB2 people can achieve multidimensional clustering by taking advantage of the table partitioning enhancements delivered in DB2 for z/OS V8, including, in particular, the ability to partition data on one key and to cluster rows within partitions by another key.</p>
<h3>Here Come the Queries</h3>
<P>

So, you have your DB2 system and database set up for good data warehouse performance. Then the users unleash their queries and report requests &mdash; and, lo and behold, some of them end up running longer than desired. Now what? Well, now you have to analyze those queries and figure out how to get them to run faster. I'll share some useful information regarding that activity in the next issue. Until then, happy data warehousing. </p>
<hr width="60%">
<P>

<P>
<em>Robert Catterall is president of Catterall Consulting, a firm that helps clients apply DB2 technology to address data management challenges and opportunities.</em></p>

				]]></body>
		</item>
	
		<item>
			<title><![CDATA[Distributed DBA: Storage, I/O, and DB2, Part 2]]></title>
			<link><![CDATA[http://ibmdatabasemag.com/story/showArticle.jhtml?articleID=216300346&cid=RSSfeed]]></link>
			<description><![CDATA[Fine-tune variables and table-space characteristics to end I/O-related performance problems.]]></description>
			<pubDate>Tue, 31 Mar 2009 17:00:12 EDT</pubDate>
			<keywords><![CDATA[Deploying DB2 Databases on Network Storage, Distributed DBA, Roger E. Sanders, DB2 Performance, Storage Administrator, Database Administration]]></keywords>
			<blurb><![CDATA[Fine-tune variables and table-space characteristics to end I/O-related performance problems.]]></blurb>
			<authors><![CDATA[Roger E. Sanders]]></authors>
			<body><![CDATA[
			
					
<P>
<img src="http://i.cmpnet.com/v2.db2mag.com/columns/sanders_roger.jpg" alt="Roger Sanders" width="90" height="90" class="Image_Float-Left" border="1" />
In my previous column, I introduced you to some <a href="http://www.ibmdatabasemag.com/story/showArticle.jhtml?articleID=211300267">basic DB2 storage concepts</a> and pointed out that network storage provisioning can have a significant effect on database performance. In this installment, I'll show you how to set an important storage I/O-related registry variable and how to fine-tune table-space characteristics, such as extent size, prefetch size, overhead, and transfer rate, when a DB2 database is deployed in a network-attached storage (NAS) or storage-area network (SAN) environment. </p>
<h3>Table-Space Performance </h3>
<P>

If you read <a href="http://www.ibmdatabasemag.com/story/showArticle.jhtml?articleID=211300267">part 1 </a> in this series, you know that RAID, which stands for redundant array of independent disks, is used to combine two or more disk drives into an array that can then be presented to a host as a single logical disk drive. If your database contains table spaces whose containers reside on a RAID device (which will most likely be the case if your database is deployed on NAS or SAN), DB2 experts at IBM recommend that you do the following:</p>
<ul>
  <li>Set the <code>DB2_PARALLEL_IO</code> registry variable to enable parallel I/O for each table space used if the table space's containers span multiple physical disks.</li>
  <li>Make the extent size of each table space equal to, or a multiple of, the RAID stripe size if striping is used. (If RAID 1 is used, the default extent size is appropriate in most cases.) </li>
  <li>Ensure that the prefetch size for each table space is either assigned the value <code>AUTOMATIC</code> or is assigned a value that is both equal to the RAID stripe size multiplied by the number of RAID devices used (or a whole multiple of this product) <em>and</em> a multiple of the extent size of the table space.</li>
  <li>Ensure that the overhead and transfer rate for the table space is appropriate for the types of disk drives being used.</li>
</ul>
<P>

Keep in mind that in a NAS or SAN environment, IBM recommends that you use one of the following:</p>
<ul>
  <li>Automatic resizing of database managed space (DMS) table spaces that use files for their storage containers (<code>FILE</code>) </li>
  <li>Automatic storage table spaces to store permanent data and system managed space (SMS) </li>
  <li>Automatic storage table spaces for temporary data. </li>
</ul>
<P>

DMS <code>FILE</code> and automatic storage table spaces are enabled for automatic resizing by specifying the <code>AUTORESIZE ON</code> option with either the <code>CREATE TABLESPACE</code> or the <code>ALTER TABLESPACE</code> SQL statement.</p>
<P>

In decision-support and data warehouse environments, in which heavy report generation workloads are common, you can use DMS <code>FILE</code> table spaces to hold temporary data. However, these temporary table spaces should never be enabled for automatic resizing. </p>
<P>

The DB2_PARALLEL_IO Variable</p>
<P>

You can use the <code>DB2_PARALLEL_IO</code> registry variable to force DB2 to use parallel I/O for table spaces that only have one container, or for table spaces whose containers reside on more than one physical disk (which is the case if the container resides on a RAID 5 or a RAID 6 device). If this registry variable isn't set, the level of I/O parallelism used is equal to the number of containers used by the table space. Therefore, if a table space spans three containers and the <code>DB2_PARALLEL_IO</code> registry variable hasn't been set, the level of I/O parallelism used is 3.</p>
<P>

On the other hand, if this registry variable is assigned a value, the level of I/O parallelism used is equal to the number of containers used <em>multiplied</em> by the value stored in the <code>DB2_PARALLEL_IO</code> registry variable &mdash; provided the prefetch size of the table space isn't <code>AUTOMATIC</code>. (In other words, the parallelism of the table space is equal to the prefetch size divided by the extent size of the table space.) Therefore, if the <code>DB2_PARALLEL_IO</code> registry variable has been set for a table space that has a prefetch size of 160 and an extent size of 32 pages, each prefetch request will be broken into five extent-sized prefetch requests (160 / 32 = 5).</p>
<P>

<P>
Often, the <code>DB2_PARALLEL_IO</code> registry variable is assigned the asterisk (<code>*</code>) value to indicate that every table space in the database is to use parallel I/O. (The asterisk value implies that each table space container used spans six physical data disk spindles.)</p>
<P>

However, in most cases, this setting isn't the correct one; using this value for table-space containers residing on anything other than RAID 5 6+1 arrays will result in a mismatch between the way DB2 attempts to parallelize I/O and the way data is actually striped across disks, which in turn will hurt performance. Instead, you should set the <code>DB2_PARALLEL_IO</code> registry variable by executing a <code>db2set</code> command that looks like this:</p>
<P>

<code>db2set DB2_PARALLEL_IO=&#91;TS_ID&#93;:&#91;<br />
  DisksPerContainer&#93; ,&hellip;</code></p>
<P>

where:</p>
<ul>
  <li><em>TS_ID</em> identifies one or more individual table spaces by numeric table space ID. (An asterisk can be used to indicate all table spaces.)</li>
  <li><em>DisksPerContainer</em> identifies the number of physical data disks used by each table space container that is assigned to the table space specified.</li>
</ul>
<P>

So, to set the <code>DB2_PARALLEL_IO</code> registry variable for a table space whose numeric ID is <code>1</code> to reflect that its storage containers reside on a RAID 5 3+1 array (three data disk spindles), you would execute a <code>db2set</code> command such as this:</p>
<P>

<code>db2set DB2_PARALLEL_IO=1:3</code></p>
<P>

To set the <code>DB2_PARALLEL_IO</code> registry variable to indicate that the storage containers for every table space in the database reside on a RAID 5 4+1 array (four data disk spindles), you would execute a <code>db2set</code> command more like this:</p>
<P>

<code>db2set DB2_PARALLEL_IO=*:4</code></p>
<h3>The <code>num_ioservers</code> Configuration Parameter</h3>
<P>

I/O servers, also called prefetchers, are used on behalf of database agents to perform prefetch I/O and asynchronous I/O for backup and other utilities. The <code>num_ioservers</code> database configuration parameter specifies the maximum number of I/O servers that can be in progress for a database at any given point in time. Non-prefetch I/Os are scheduled directly from database agents and, as a result, aren't constrained by the value assigned to the <code>num_ioservers</code> database configuration parameter.</p>

				
					
<P>
To fully exploit all the I/O devices in a database that's using automatic storage, this configuration parameter should be assigned the value <code>AUTOMATIC</code>. If you aren't using automatic storage or the value <code>AUTOMATIC</code> isn't recognized (as is the case with earlier versions of DB2), this configuration parameter should be assigned a number that is one or two more than the number of physical devices on which the database resides. It's better to configure a few additional I/O servers and not use them than not to configure enough; a minimal amount of overhead results from each one. </p>
<h3>Table-Space Extent Size</h3>
<P>

Data is transferred to and from containers in 4K, 8K, 16K, or 32K blocks called pages. When a table space spans multiple containers, data is written in groups of pages (called extents) to each container in a round-robin fashion. Thus, a table space's extent size is essentially its stripe size if more than one container is used. </p>
<P>

A table space's extent size is defined as part of the table space creation process and can't be changed without dropping and recreating the table space. Therefore, it's important to choose the right extent size for a table space before it becomes populated with data. </p>
<P>

What's a good extent size to use? Ideally, you should specify an extent size that's large enough to contain one full RAID stripe so that every physical disk spindle within each LUN will spin together when one prefetcher does a large block read. Therefore, to determine an appropriate extent size, you need to solve the following equation: </p>
<P>

<code>extent size = (RAID stripe size &times; number of data disks) / table space page size</code></p>
<P>

Let's say you have a table space that spans three containers and each container resides on a RAID 5 4+1 array. If the RAID stripe size is 8K and the table space page size is 8K, an appropriate extent size would be 4 pages (8K &times; 4 = 32K; 32K / 8K = 4 pages). </p>
<P>
<h3>Table-Space Prefetch Size</h3>
<P>

When only one or just a few consecutive pages are retrieved from a DB2 database, data is transferred from storage to memory one page at a time. But when a lot of data is needed, DB2 retrieves additional pages from disk in anticipation that they will be needed soon. This behavior is known as <em>prefetching</em>; prefetching index and data pages into memory can help improve performance by reducing I/O wait time.</p>
<P>

Like the extent size, a table space's prefetch size is defined during the table-space creation process. However, unlike extent size, a table space's prefetch size <em>can</em> be changed without having to drop and recreate the table space. </p>
<P>

When it comes to deciding on the optimal prefetch size to use, IBM recommends that you let DB2 make the decision for you. To do so, you assign a table space a prefetch size of <code>AUTOMATIC</code> or assign the value <code>AUTOMATIC</code> to the <code>dft_prefetch_sz</code> database configuration parameter, using the default whenever new table spaces are created.</p>
<P>

For table spaces created in this manner, DB2 will determine the optimal prefetch size to use (and update that value periodically) by solving the following equation:</p>
<P>

<code>prefetch size = number of table space containers &times; number of data disks per container &times; extent size</code></p>
<P>

The number of physical disks per container defaults to <code>1</code>, unless a different value is specified via the <code>DB2_PARALLEL_IO</code> registry variable (which is a good reason why the <code>DB2_PARALLEL_IO</code> registry variable needs to be assigned the correct value).</p>
<P>

<h3>Overhead</h3>
<P>

The term "overhead" refers to I/O controller overhead as well as disk latency time, which includes disk seek time in milliseconds (ms). Overhead is used by the DB2 optimizer to determine the cost of I/O during query optimization. You can define a table space's overhead during the table space creation process, and you can change it after a table space has been created.</p>
<P>

For a database created using DB2 9 or later, the default overhead value used is 7.5 ms. However, the default may not always be the appropriate value to use for your particular storage platform. To estimate actual overhead cost, solve the following equation:</p>
<P>

<code>overhead = average seek time of disks used, in milliseconds + (0.5 &times; rotational latency)</code></p>
<P>

where <code>0.5</code> represents an average overhead for one half rotation and <code>rotational latency</code> is calculated in milliseconds for each full rotation, as follows: </p>
<P>

<code>(1 / disk rpm) &times; 60 &times; 1000</code></p>
<P>

You divide by rotations per minute to get minutes per rotation, multiply by 60 seconds per minute, and finally, multiply by 1000 milliseconds per second.</p>
<P>

For example, if a disk drive is rated at 7,200 RPM, you would calculate its rotational latency as follows:</p>
<P>

<code>(1 / 7200) &times; 60 &times; 1000 = 8.328 ms</code></p>
<P>

If the average seek time for this disk is assumed to be 11 ms, you can calculate the overhead as follows: </p>
<P>

<code>Overhead = 11 + (0.5 &times; 8.328) = 15.164 ms</code></p>
<h3>Transfer Rate</h3>
<P>

The transfer rate is the time (in milliseconds) it takes to read one page into memory. The DB2 optimizer uses this value to determine I/O costs during query optimization. As with overhead, a table space's transfer rate can be defined during the table space creation process and can be changed after a table space has been created.</p>
<P>

For a database created with DB2 9 or later, the default time to read one 4K page into memory is 0.06 ms. Again, the default value may not be appropriate for your particular storage platform. If each table space container resides on a single physical disk, you can use the following equation to estimate the actual transfer cost in milliseconds per page:</p>
<P>

<code>Transfer rate = (1 / specification_rate) &times; 1000 / 1024000 &times; page size</code></p>
<P>

where <code>specification_rate</code> represents the disk specification for the transfer rate in megabytes per second.</p>
<P>

In this equation, you divide by the disk specification transfer rate to get seconds per megabyte, multiply by 1000 milliseconds per second, divide by 1,024,000 bytes per megabyte, and multiply by page size, in bytes.</p>
<P>

For example, if the specification rate for a disk drive is 3MB per second and a page size of 4K is used, you would calculate the transfer rate as follows:</p>
<P>

<code>Transfer rate = (1 / 3) &times; 1000 / 1024000 &times; 4096 = 1.333248 ms per page</code></p>
<P>

If a table space's containers aren't single physical disks but, instead, are arrays of disks (in other words, a RAID array), you'll need to take additional considerations into account when determining the appropriate transfer rate to use. If the array is relatively small, you can multiply the disk specification transfer rate (<code>specification_rate</code>) by the number of disks used, assuming the bottleneck is at the disk level. </p>
<P>

However, if the number of disks in the array is large, the bottleneck may not be at the disk level, but at one of the other I/O subsystem components (such as disk controllers, I/O buses, or the system bus). In this case, you can't assume that the I/O throughput capability is the product of the disk specification transfer rate and the number of disks. Instead, you must measure the actual I/O rate in megabytes per second during a sequential scan and divide the result by the number of containers that make up the table space.</p>
<P>

For example, a measured sequential I/O rate of 100MB per second for a table in a four container table space would imply 25MB per second per container, or a transfer rate of (1/25) &times; 1000 / 1024000 &times; 4096 = 0.16 ms per page.</p>
<h3>File System Caching</h3>
<P>

With most file systems, a typical read operation involves moving data from storage into the file system cache and then copying the data from cache to the application buffer. Similarly, a write operation involves copying data from the application buffer to the file system cache, then copying the data from cache to storage. </p>
<P>

Because DB2 manages its own data caching via buffer pools, caching at the file- system level isn't needed if the buffer pools have been sized appropriately. And in some cases, caching at both the file-system level and in DB2 buffer pools can cause performance degradation because of the extra CPU cycles needed for the double caching. (One of the reasons raw I/O has been the preferred choice for I/O-<br />
  intensive database workloads is its superior performance, which is due, in part, to the fact that it sidesteps the caching and locking mechanisms used by file systems.)</p>
<P>

To get around this issue, file-system vendors developed an alternate I/O mechanism, called <em>Direct I/O</em> (DIO), which tried to eliminate performance bottlenecks by bypassing caching at the file system level. IBM introduced its own file system feature called Concurrent I/O (CIO) for the Enhanced Journaling File System (JFS2) used with AIX 5L version 5.2.10. (On Windows, this functionality is provided by opening a file with the <code>FILE_FLAG_NO_BUFFERING</code> flag specified.) DIO and CIO are typically implemented at the system level via mount point options.</p>
<P>

Beginning with the version 8.2 release, DB2 provided support for DIO/CIO on AIX, and DIO on HP, Solaris, Linux, and Windows. DB2 also provided a way to bypass file system caching at the table space level, rather than at the file system level, by specifying the <code>NO FILE SYSTEM CACHING</code> clause with either the <code>CREATE TABLESPACE</code> statement or the <code>ALTER TABLESPACE</code> statement. Prior to version 9.5, the use of file system caching was implied and could be disabled by using this clause; starting with DB2 9.5, the use of file system caching is disabled by default when new DMS table spaces are created on AIX JFS2 file systems, Linux (with the exception of Linux for System z), Solaris, and Windows.</p>
<P>

IBM recommends avoiding the use of mount point options to implement DIO or CIO and using the <code>NO FILE SYSTEM CACHING</code> clause to disable file system caching at the table space level instead. When the <code>NO FILE SYSTEM CACHING</code> flag is applied to a table space, DB2 automatically takes advantage of DIO or CIO on file systems where this feature exists. </p>
<h3>Provisioning Control</h3>
<P>

As the need to store information continues to grow, so will the demand for deploying DB2 databases on network storage. How network storage is provisioned can have a significant effect on database performance, so don't hesitate to make recommendations to your storage administrator when you request storage space. </p>
<P>

Keep in mind that when a DB2 database is deployed in a NAS or SAN environment, you should set the <code>DB2_PARALLEL_IO</code> registry variable and table space characteristics to reflect the specific storage hardware and configuration used. By configuring DB2 and your storage subsystem together, I/O-related performance issues can be reduced or eliminated. </p>
<P>

<em>Special thanks to Aamer Sachedina, senior technical staff member at the IBM Toronto Lab, for sharing his presentation "Everything You Ever Wanted to Know about Storage, I/O and DB2 But Were Afraid to Ask" and for reviewing the material presented in this article.</em></p>
<hr width="60%"/>
<P>

<P>
<em>Roger E. Sanders</a>, a consultant corporate systems engineer at EMC Corp., is the author of 17 books on DB2 for Linux, Unix, and Windows and teaches classes at many DB2 conferences. His latest book is titled</em> DB2 9 for Linux, UNIX, and Windows Advanced Database Administration: Certification Study Guide<em> (MC Press, 2008).</em></p>
<P>

				]]></body>
		</item>
	
		<item>
			<title><![CDATA[Informix DBA: Informix Performance Locking and Concurrency]]></title>
			<link><![CDATA[http://ibmdatabasemag.com/story/showArticle.jhtml?articleID=216300349&cid=RSSfeed]]></link>
			<description><![CDATA[Improve concurrency and locking to provide faster data access for more users.]]></description>
			<pubDate>Tue, 31 Mar 2009 17:00:11 EDT</pubDate>
			<keywords><![CDATA[IBM Informix Dynamic Server, Informix DBA, Informix Performance, IDS Data Access, Locking, Concurrency]]></keywords>
			<blurb><![CDATA[Improve concurrency and locking to provide faster data access for more users.]]></blurb>
			<authors><![CDATA[Lester Knutsen]]></authors>
			<body><![CDATA[
			
					
<P>
<img src="http://i.cmpnet.com/v2.db2mag.com/columns/knutsen_lester.jpg" alt="Lester Knutsen" width="90" height="90" border="1" class="Image_Float-Left" />
One important performance tuning step that's often overlooked is to check how well Informix Dynamic Server (IDS) is handling locks and concurrency. In the Advanced Informix Performance Tuning class I teach, tuning locks is one of the top five items we use to improve performance.</p>
<P>
<h3>Number of Locks</h3>
<P>

When Informix starts up, it reads the ONCONFIG file and uses the <code>LOCK</code> parameter to create a memory structure (let's call it the lock table) to manage locks. The default setting in versions before IDS 11 was 2,000 entries, which is too small. In IDS 11, the default is 20,000 locks &mdash; better, but still not enough for high-volume systems. </p>
<P>

Each user session that opens a database, opens a table, or reads or updates rows generates locks in the lock table. Opening a database gets a shared lock on the database to prevent someone else from dropping the database. Opening a table gets one shared lock on the table to prevent that table from changing while it's in use. With the exception of "dirty reads," shared locks are placed on each row that is read. And when a row is updated, deleted, or inserted, additional locks are placed on indexes used by that row. </p>
<P>

Here's an example: Updating 1,000 rows with three indexes will place 1,000 row locks, 3,000 locks on indexes, and table and database locks for a total of 4,002 locks. This volume will quickly overflow the default lock-table structure in memory. Informix dynamically increases the size of the lock table when needed. However, the additional space for the lock tables in memory is in a different part of shared memory, which leads to a fragmented lock table. If the lock table overflows several times, searches can really slow down.</p>
<P>

To diagnose lock-table overflow, look at the output of the <code>onstat -k</code> command. At the end of the output, you'll see how many times the lock table has overflowed. Figure 1 shows an example of a lock table that has overflowed two times. The last line shows that there are 42,239 current active locks and the total number of locks is 80,000. In this example, I would change the <code>LOCK</code> parameter in the ONCONFIG file to 80,000 so that the table doesn't overflow. In the benchmarks we do in my Advanced Performance Tuning class, we sometimes see 30-40 overflows using the default values. That's why fixing this setting makes my list of top five performance improvements.</p>
<P>

Note: The <code>onstat -k</code> option will display all active locks, so the display could be very long. If you have a large number of <code>LOCK</code>s defined in your ONCONFIG file and many users, you could see thousands of rows from this command. </p>
<h3>Lock Ownership</h3>
<P>

How do you find out which user has a lock on an object? The "owner" column in Figure 1 lists the address in shared memory of the user who owns a lock. Use this with <code>onstat -u</code> to see all users, and compare this with the "address" column to identify username of the owner.</p>
<P>

<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/dbt14n1_informixdba_fig1.jpg" width="400" height="164"></p>
<P>

<strong>Figure 1. onstat -k display of the lock table in memory.</strong></p>
<h3>Locked Tables</h3>
<P>

How do you find out which object is locked? The "tblsnum" column identifies the locked table. Compare this with the output of the following SQL statement to convert a table's partnum to hex to identify which table is locked.</p>
<P>

<code>select tabname, hex(partnum) tblsnum from systables where tabid > 99;</code></p>
<P>

This SQL statement will return a list of tables and their associated tblsnum. Figure 2 contains an example of how to identify which table is locked.</p>
<P>

<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/dbt14n1_informixdba_fig2.jpg" width="400" height="291"></p>
<P>

<strong>Figure 2. Identifying which table is locked.</strong></p>
<P>

The tblsnum 100002 has a special meaning &mdash; it indicates a database lock. Every user who opens a database will place a shared lock on the database. Figure 1 shows three database locks.</p>

				
					<h3>Lock Levels</h3>
<P>

Informix locks objects at the database, table, page, row, byte, and index key levels. You can identify a lock's level by looking at the table space, row ID, and key/byte column in the <code>onstat -k</code> command. Table 1 lists the lock levels and how to identify them.</p>
<P>

<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/dbt14n1_informixdba_tab1.jpg" width="400" height="150"></p>
<P>

<strong>Table 1. Descriptions of lock levels.</strong></p>
<h3>Lock Types</h3>
<P>

The column "type" ("flags" in earlier releases) in the <code>onstat -k</code> output describes what type of lock is in effect. Table 2 lists the lock types.</p>
<P>

<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/dbt14n1_informixdba_tab2.jpg" width="400" height="200"></p>
<P>

<strong>Table 2. Descriptions of lock types shown in onstat -k output.</strong></p>
<h3>Concurrency: Allowing More Users to Share Data</h3>
<P>

Concurrency is all about how you get more users accessing and working with the same data without locking each other out. The simple way to lock everyone out is to lock the whole database in exclusive mode. But this approach isn't acceptable in a high-volume, multiuser environment. </p>
<P>

Informix offers the following five levels of concurrency, set with the <code>set isolation</code> SQL command:</p>
<P>

<strong>Dirty Read.</strong> This concurrency level doesn't lock any rows and may read rows that other users have locked and are changing. It can return uncommitted data that may be rolled back. This level is useful in a data warehouse environment or in any environment in which getting the data is more important than reading committed records.</p>
<P>

<strong>Committed Read.</strong> This level of concurrency doesn't lock any row but will fail if someone else has an update or exclusive lock on a row. It will only read committed rows. The row may be changed after it has been read, but must have no locks on it to be read. This level is the default for databases with logging and is the level required by most OLTP applications. However, you must provide error handling in the event that a user requests a row that is locked by another user.</p>
<P>

<strong>Cursor Stability.</strong> This level places a shared lock on a selected row so no other user can update the row a user is reading. The lock will be released as soon as another row is fetched or the cursor is closed.</p>
<P>

<strong>Repeatable Read. </strong>This level creates the most locks, because it will place a shared lock on all rows read or scanned by a user so that the rows won't change and repeating the read will return the same records and values. Locks are freed when the transaction is committed or rolled back. This level is the default for ANSI-mode databases.</p>
<P>

<strong>Last Committed Read. </strong>This isolation level, new in IDS 11, works much like a committed read; however, when a row is locked for update, IDS will read the last committed record from the logs. This level only works when a table is created with row-level locking, but it can greatly reduce locking errors and will return the last valid data.</p>
<h3>Performance Effects </h3>
<P>

Here's an example of how concurrency and isolation levels can affect locks and performance. Figure 3 shows a user locking a record with an update statement. Figure 4 shows the results of three SQL statements trying to read that locked row. The first statement using the default committed read fails; the next statement gets the data as it's being changed. However, the data may be rolled back or changed again before the lock is released. The last statement reads the last committed version of that row, which places no locks and gets valid data.</p>
<P>

<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/dbt14n1_informixdba_fig3.jpg" width="400" height="70"></p>
<P>

<strong>Figure 3. Locking a row.</strong></p>
<P>

<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/dbt14n1_informixdba_fig4.jpg" width="400" height="292"></p>
<P>

<strong>Figure 4. Results of concurrency and locks.</strong></p>
<h3>More Data, Faster</h3>
<P>

Take a look at how many locks your system is using and learn about the new isolation level Last Committed Read. Tuning locks will let users get data faster; by choosing the right isolation setting, you'll be able to provide more users access to your data. </p>
<hr width="60%" />
<P>

<em><a href="mailto:lester@advancedatatools.com">Lester Knutsen</a> is president of Advanced DataTools Corp., an IBM Informix consulting and training partner specializing in data warehouse development, database design, performance tuning, and Informix training and support. He is president of the Washington D.C. Area Informix User Group, a founding member of the International Informix Users Group, and an IBM Gold Consultant.. </em></p>
				]]></body>
		</item>
	
		<item>
			<title><![CDATA[Programmers Only: DB2 Answers]]></title>
			<link><![CDATA[http://ibmdatabasemag.com/story/showArticle.jhtml?articleID=216300355&cid=RSSfeed]]></link>
			<description><![CDATA[Compression questions and a sticky DB2 for z/OS SORT mystery are put to rest.]]></description>
			<pubDate>Tue, 31 Mar 2009 17:00:10 EDT</pubDate>
			<keywords><![CDATA[DB2 for z/OS Sort Syntax, Order By Clause, Programmers Only, Bonnie Baker, DB2 SQL, Buffer Pools, Compression, Dynamic Data, Compression Costs]]></keywords>
			<blurb><![CDATA[Compression questions and a sticky DB2 for z/OS SORT mystery are put to rest.]]></blurb>
			<authors><![CDATA[Bonnie Baker]]></authors>
			<body><![CDATA[
			
					
<P>
<img src="http://i.cmpnet.com/v2.db2mag.com/columns/bonnie_baker.jpg" alt="Bonnie Baker" width="90" height="90" class="Image_Float-Left" border="1" />
I receive many questions via email, often the same ones again and again. I'll use this column to answer a few of the most common. 
<h3>Question 1: The Real SORT Mystery</h3>
<P>

In Example 4 in my last column ("<a href="http://ibmdatabasemag.com/showArticle.jhtml?articleID=211300275">The Mystery of DB2 Sorts,"</a>), I wrote about how an index could be used for the following SQL statement: </p>
<P>

<code>Select workdept, lastname, jobcode<br />
&#160;&#160; from employee_master<br />
Where workdept in ('A01', 'B22', 'B46')<br />
&#160;&#160; and lastname &gt;= :hvlastname<br />
Order by lastname<br /></code><br/><br />
I wrote: </p>
<blockquote>
  
<P>
Using the third index &#91;on <code>jobcode</code>, <code>workdept</code>, <code>lastname</code>&#93;, DB2 could not match on either predicate, but it could apply the predicates to the index data by screening (not matching) on <code>lastname</code> and <code>workdept</code>. For each index row that qualified, DB2 could retrieve the three selected columns from the index itself, thereby avoiding any reads to the table. <em>And since the first column of the index is <code>lastname</code>, the data would be in <code>lastname</code> order. No <code>SORT</code> would be needed.</em>
</blockquote>
<P>

And you asked, "But the first column of Index 3 is not <code>lastname</code>. How could the data be returned in <code>lastname</code> order without a data sort?" Now <em>there</em> is a mystery.</p>
<P>

I received this question from at least 200 readers. For me, this was good news and bad. The good news is that folks out there are actually reading my column &mdash; and reading it <em>very</em> closely. The bad news is that by the time the last of the emails arrived, the egg on my face was very sticky.</p>
<P>

What happened was this. I started out with a totally different fictional third index and a different point I wanted to make. I then changed my mind about the index and the point to make. I changed the index, and then I overtyped an existing paragraph, inadvertently forgetting to delete the last two sentences (the ones italicized above). </p>
<P>

So, what was the point I wanted to make? This: When the access path that DB2 chooses is index only, DB2 ignores the <code>CLUSTERRATIO</code> of the index. For index-only access, DB2 doesn't care about the relationship between the order of the index and the order of the table. DB2 will never choose to use the index the List Prefetch way. Why? There's no point in doing a <code>RID SORT</code> to make the reads to the table more sequential and less random because there will be <em>no reads to the table</em>. </p>
<P>

In this example, DB2 will do a full index space scan, read every single index row using sequential prefetch, apply the two predicates to every row and, for qualified rows, retrieve all three columns from the index data. The data will <em>not</em> be in <code>lastname</code> order. Therefore, DB2 will have to do a <code>SORT</code> to satisfy the <code>ORDER BY</code> clause.</p>
<P>

Now I think I'll go wash that egg off my face.</p>
<h3>Question 2: Buffer Pools and Compression</h3>
<P>

You've asked many questions about compression and how data is addressed in the buffer pool. For example, one question was, "If my table space is defined with <code>COMPRESS(YES)</code>, when is the data decompressed &mdash; before it is put into the buffer pool or after?"</p>
<P>

The answer to this simple question is actually very complicated. </p>

				
					
<P>
First of all, just because the table space is defined with <code>COMPRESS(YES)</code> doesn't mean that every row will actually be compressed. DB2 must be given an opportunity to build a customized compression dictionary for the data in the table space and an opportunity to use that dictionary for each row. </p>
<P>

Even then, every row may not be compressed. Why? During compression, each byte (eight bits) is compressed to as little as one bit and to as many as 12 bits. The more common data will be compressed to the smaller number of bits (one, two, three, four&hellip;) and the rarer data will be compressed to the larger number of bits (eight, nine, ten, eleven&hellip;). The net should be a shorter row. However, if a row happens to have a modicum of common data and an excess of rare data, the compressed row may actually be <em>longer</em> than the uncompressed row. When this happens, DB2 will toss out the compressed image and store the shorter, uncompressed row. So, for example, on a single page containing ten rows, nine may be compressed and one uncompressed.</p>
<P>

Second, the pages are brought into the buffer pool the same way they are written to disk. Compressed rows are still compressed; uncompressed rows are still uncompressed.</p>
<P>

Third, whether or not the compressed rows are ever decompressed depends upon the SQL and what DB2 needs to do to apply predicates and give you back what you want to see.</p>
<P>

Let's look at a few examples:</p>
<P>

SQL #1: <code>DELETE FROM POMASTER WHERE PONBR = :HVPONBR</code></p>
<P>

The primary key of the <code>POMASTER</code> table is <code>PONBR</code>. DB2 can fully qualify the row to be deleted by applying the <code>WHERE</code> clause predicate to the index. No predicates have to be applied to the table. In other words, this is an <code>INDEX_ONLY</code> delete (see <a href="http://ibmdatabasemag.com/showArticle.jhtml?articleID=209900061">"The Mystery of DB2 for z/OS Index-Only Updates and Deletes"</a> in Resources. </p>
<P>

There is no need for the table row to be decompressed to be deleted. And as an extra performance bonus, the <code>UNDO</code> and <code>REDO</code> log records can be created using the compressed row. </p>
<P>

There is an exception. Some companies use their log records to derive information. Because these companies don't want their log records to be compressed, they use a feature called <code>DATACAPTURE</code> to tell DB2 to log uncompressed images of each row.</p>
<P>

SQL #2: <code>SELECT PONBR FROM POMASTER WHERE CUSTNO = :HVCUSTNO</code></p>
<P>

On the <code>POMASTER</code> table, there's a two-column index on <code>CUSTNO</code>, <code>PONBR</code>. DB2 can apply the single predicate to the index <em>and</em> retrieve a list of all the purchase order numbers for the customer by reading just the index data. This is true <code>INDEX-ONLY</code> access, and the table rows won't even be read, much less decompressed.</p>
<P>

SQL #3: <code>UPDATE POMASTER SET FLAG = :HVFLAG WHERE PONBR = :HVPONBR</code></p>
<P>

Again, DB2 can apply the predicate on <code>PONBR</code> to the index data. It then uses the RID (containing information about both the table page and the row location) to do the <code>GET PAGE</code> request. After the page (containing compressed and possibly uncompressed rows) is put into the buffer pool, the desired row is located and the row header is inspected. The row header contains information as to whether this actual row is compressed or not. Our row is compressed. Only our row must be decompressed in order for DB2 to update the <code>FLAG</code>. The row is then recompressed, and the <code>UNDO REDO</code> log records are created using the compressed format (again taking into consideration <code>DATACAPTURE</code>).</p>
<h3>Question 3: Compression Costs for Dynamic Data</h3>
<P>

You asked, "Doesn't compression cost too much if my table data is dynamic?"</p>
<P>

Again, the answer is, "It depends." Dynamic in what way? Is the data refreshed daily but essentially read-only for SQL? Is the number of rows in the table steadily increasing but, except for the <code>LOAD RESUME</code> utility or the mass <code>INSERT</code> program, essentially read-only data? How good is your compression ratio? Are you getting twice as many rows per page during your inserts? Will your <code>GET PAGE</code> requests possibly be cut in half because all of your table rows now fit on half a million pages instead of a million? Are most of your deletes <code>INDEX ONLY</code>? Do you fully qualify most of your selected and updated rows using index data so that very few predicates are ever applied to table data? Or, are many predicates applied to table data before the row can be accepted or rejected?</p>
<P>

There are so many benefits to compression that I have found that nine times out of ten, the cost of compressing rows and decompressing rows is more than paid for by:</p>
<ul>
  <li>The increase in the number of rows per page, resulting in a reduction in <code>GET PAGES</code> and <code>READ I/O</code>s</li>
  <li>The increased number of rows that can be <code>INSERT</code>ed on each page</li>
  <li>The greater number of rows that will fit within the <code>PCTFREE</code> and <code>FREEPAGE</code> quantities and the reduction in the <code>PCTFREE</code> and <code>FREEPAGE</code> quantities needed</li>
  <li>The longer amount of time that table data stays in <code>CLUSTER</code></li>
  <li>The reduction in the number of <code>REORG</code>s needed</li>
  <li>The reduced search time during <code>INSERT</code> logic</li>
  <li>The increase in the number of <code>LOG</code> records that will fit on a <code>LOG</code> page</li>
  <li>The faster the speed of <code>ROLLBACK</code> and <code>RECOVERY</code></li>
  <li>The fewer <code>LOG</code> pages that have to be flushed to disk</li>
  <li>And, of course, the reduction in the amount of disk space required. </li>
</ul>
<P>

In my experience, with rare exception, compression that results in a significant reduction in pages per table more than pays for itself even for highly maintained data. </p>
<h3>Bring on the Ideas</h3>
<P>

Thanks for spotting my glitches and keeping me straight. Keep your questions coming. The hardest part of writing this column is coming up with a topic that I think you might want to learn more about. Your ideas are always welcome. </p>
<hr width="60%" />
<P>

<em>Bonnie Baker accepts requests to teach private classes for corporations and DB2 User Groups. She specializes in applications performance and version transition training on the DB2 for z/OS platform. She is an IBM DB2 Gold Consultant, a five-time winner of the IDUG Best Speaker award, and a member of the IDUG Speakers' Hall of Fame. She is best known for her series of seminars entitled "Things I Wish They'd Told Me 8 Years Ago" and for writing the "Programmers Only" column. She can be reached through Bonnie Baker Corporation at 1-813-837-3393 <br />
or <a href="mailto:bkbaker@bonniebaker.com">bkbaker@bonniebaker.com</a>. </em></p>
				]]></body>
		</item>
	
		<item>
			<title><![CDATA[Emerging Technologies: Cool Shades of Big Blue]]></title>
			<link><![CDATA[http://ibmdatabasemag.com/story/showArticle.jhtml?articleID=216300358&cid=RSSfeed]]></link>
			<description><![CDATA[The information management community gets socialized on Web 2.0.]]></description>
			<pubDate>Tue, 31 Mar 2009 17:00:09 EDT</pubDate>
			<keywords><![CDATA[Information Management Social Networking, Twitter, Web 2.0, IDUG, IIUG, DB2, IMS]]></keywords>
			<blurb><![CDATA[The information management community gets socialized on Web 2.0.]]></blurb>
			<authors><![CDATA[Lindsay M. Furman]]></authors>
			<body><![CDATA[
			
					
<P>
IBM has a reputation as a trusted company, an innovator of top technology, and a reliable brand. The company is also a growing presence in the social networking sphere. You read that correctly &mdash; IBM is establishing an important niche in the constantly evolving space that is Web 2.0. </p>
<P>

Maybe you think the realm of social networking is for connecting with friends on Facebook, sharing videos of life's funny moments on YouTube, or establishing professional connections on LinkedIn. You're right &mdash; social networking is for all of those things. But recently, Web 2.0 has developed from one's own personal space into something much larger. </p>
<P>

Social networking communities once geared toward students and teens are now becoming more business oriented. Facebook, previously populated mainly by a college-aged crowd, now boasts business professionals and IT types among its members. On YouTube.com, where once you'd find mainly amateur videos shot to entertain friends and family, you'll now find videos and whole channels created for professional interest groups and communities, political campaigns, and corporations. This shift to a business-oriented Web 2.0 is having a big effect on customers, on businesses, and on communities.</p>
<P>

<P>
Strong community support for IBM products pre-dates the social networking era, as the robust participation in the <a href="http://idug.org" target="_blank">International DB2 Users Group (IDUG)</a> and the <a href="http://iiug.org" target="_blank">International Informix Users Group (IIUG)</a> indicates. The rise of social networking sites gives these and other communities more options for participation than ever. </p>
<P>

To claim its space in the social networking sphere, IBM has developed many online communities that personalize the company and give customers access to many of its employees. These communities help to connect customers across the globe. </p>
<P>

If you join a Facebook group dedicated to your favorite rock band, why not join a group dedicated to your favorite database? If you like to watch funny videos on YouTube, why not check out IBM's amusing viral video series? If you connect with friends and coworkers online, why not connect with fellow data management aficionados? </p>
<P>

IBM has asked and answered these questions by using the social networking arena as a new setting to build on the already strong data management community. If you're interested in becoming part of this ever-growing community, or, if you're already an active participant and want to know what else is out there, visit one of the numerous groups, blogs, or networks. Take this opportunity to make your mark on the data management <br />
community.</p>
<hr width="60%"/>
<P>

<em>Lindsay M. Furman works on Information On Demand client references and marketing productions in the IBM Software Group.</em></p>
<P>
<div class="Article_Sidebar_Larger">
<h2>Who's On Twitter</h2>
<P>

This list is a small sample of the folks you'll find "twittering" on any given day. For a more complete list (or to add yourself) see the Who's Who in Information Management Social Networking page in the <a href="http://wiki.ibmdatabasemag.com/index.php/Who%27s_Who_in_Information_Management_Social_Networking"><em>IBM Database Magazine</em> wiki</a>.</p>
<h3>DB2 </h3>
<ul>
  <li><a href="http://twitter.com/ravahuja" target="_blank">Rav Ahuja</a></li>
  <li><a href="http://twitter.com/ebennerdotcom" target="_blank">Jeffrey Benner</a></li>
  <li><a href="http://twitter.com/acangiano" target="_blank">Antonio Cangiano</a></li>
  <li><a href="http://twitter.com/db2" target="_blank">DB2</a> </li>
  <li><a href="http://twitter.com/wfavero" target="_blank">Willie Favero</a></li>
  <li><a href="http://twitter.com/cuneytg" target="_blank">C&#188;neyt Gksu</a> </li>
  <li><a href="http://twitter.com/shayes" target="_blank">Scott Hayes</a></li>
  <li><a href="http://twitter.com/katsnelson" target="_blank">Leon Katsnelson</a></li>
  <li><a href="http://twitter.com/craigmullins" target="_blank">Craig Mullins</a></li>
  <li><a href="http://twitter.com/db2fred" target="_blank">Fred Sobotka</a> </li>
  <li><a href="http://twitter.com/jstuhler" target="_blank">Julian Stuhler</a></li>
</ul>
<h3>IMS </h3>
<ul>
  <li><a href="http://twitter.com/dougielawson" target="_blank">Dougie Lawson</a> </li>
</ul>
<h3>Information Management </h3>
<ul>
  <li><a href="http://twitter.com/ibmdatabasemag" target="_blank"><em>IBM Database Magazine</em></a> </li>
  <li><a href="http://twitter.com/kmoutsos" target="_blank">Kim Moutsos</a> </li>
</ul>
<P>

<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/dbt14n1_social_twitter.jpg" alt="" /></p>
</div>
<P>

&nbsp;</p>
<P>
<div class="Article_Sidebar_Larger">
<h2>IBM Data Management Communities</h2>
<h3>User Groups</h3>
<ul>
  <li>The International DB2 Users Group Community: <a href="http://www.idug.org" target="_blank"><br />
    www.idug.org</a></li>
  <li> The International Informix Users Group Community: <br />
    <a href="http://www.iiug.org" target="_blank">www.iiug.org</a></li>
</ul>
<h3>Data Management Networking Sites and Blog Aggregators</h3>
<ul>
  <li><a href="http://channeldb2.com" target="_blank">channeldb2.com</a></li>
  <li><a href="http://www.planetdb2.com" target="_blank">planetdb2.com</a></li>
  <li><a href="http://www.planetids.com" target="_blank">planetids.com</a></li>
</ul>
<h3>Facebook Groups and Pages</h3>
<ul>
  <li> <a href="http://www.facebook.com/group.php?gid=35639913751" target="_blank"><em>IBM Database Magazine</em> Group</a></li>
  <li> <a href="http://www.facebook.com/pages/IBM-DB2/10442975871?ref=ts" target="_blank">IBM DB2 Fan Page</a></li>
  <li> <a href="http://www.facebook.com/pages/IBM-IMS/38404374060?ref=ts" target="_blank">IBM IMS Fan Page</a></li>
  <li> <a href="http://www.facebook.com/group.php?gid=2249729222" target="_blank">IBM Informix Dynamic Server (IDS) Fan Page</a></li>
  <li> <a href="http://www.facebook.com/pages/IBM-InfoSphere-Warehouse/17628877758?sid=ead596e4fc67fe2caeaa2e586b086efd&ref=s" target="_blank">IBM InfoSphere Warehouse Fan Page</a></li>
  <li> <a href="http://www.facebook.com/pages/IBM-Optim/37213992975?sid=00519879e09c12f12c0b439177410fb0&ref=s" target="_blank">IBM Optim Fan Page</a></li>
</ul>
<h3>YouTube Channels</h3>
<ul>
  <li> IBM Data Management Videos Channel: <br />
    <a href="http://www.youtube.com/user/IBMer5985" target="_blank">www.youtube.com/user/IBMer5985</a></li>
  <li> IBM Channel:<br />
    <a href="http://www.youtube.com/user/reviewIBM" target="_blank">www.youtube.com/user/reviewIBM</a></li>
  <li> IBM Data Management Viral Video Series: <br />
    <a href="http://www.youtube.com/user/pureXML" target="_blank">www.youtube.com/user/pureXML</a></li>
</ul>
<h3>LinkedIn Groups</h3>
<ul>
  <li> <a href="http://www.linkedin.com/groups?gid=45375" target="_blank">DB2 Professionals</a></li>
  <li> <a href="http://www.linkedin.com/groups?gid=122710&trk=anetsrch_name&goback=.gdr_1237928996036_1" target="_blank">DB2 UDB DBA</a></li>
  <li> <a href="http://www.linkedin.com/groups?gid=55779&trk=anetsrch_name&goback=.gdr_1237928996038_1" target="_blank">Mainframe Experts Network</a></li>
  <li> <a href="http://www.linkedin.com/groups?gid=48644&trk=anetsrch_name&goback=.gdr_1237928996040_1" target="_blank">Information Management Enthusiasts</a></li>
  <li> <a href="http://www.linkedin.com/groups?gid=127956&trk=anetsrch_name&goback=.gdr_1237928996042_1" target="_blank">DB2-SAP</a></li>
  <li> <a href="http://www.linkedin.com/groups?gid=25049&trk=anetsrch_name&goback=.gdr_1237928996044_1" target="_blank">Informix Supporter</a></li>
  <li> <a href="http://www.linkedin.com/groups?gid=122715&trk=anetsrch_name&goback=.gdr_1237928996046_1" target="_blank">Informix Related People</a></li>
  <li> <a href="http://www.linkedin.com/groups?gid=1350&trk=anetsrch_name&goback=.gdr_1237928996048_1" target="_blank">U2 Users Group</a></li>
</ul>
<P>

<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/dbt14n1_social_facebook.jpg" alt="" width="400" height="300" /></p>
<P>
</div>
<P>

				]]></body>
		</item>
	
		<item>
			<title><![CDATA[Skills Zone: DB2 Study Guides]]></title>
			<link><![CDATA[http://ibmdatabasemag.com/story/showArticle.jhtml?articleID=216300362&cid=RSSfeed]]></link>
			<description><![CDATA[Advice on evaluating the many DB2 certification books on the market. ]]></description>
			<pubDate>Tue, 31 Mar 2009 17:00:08 EDT</pubDate>
			<keywords><![CDATA[IBM DB2 Certification Process, Howard Fosdick, IBM Database Magazine Bookstore, Database Administrator Exam, DB2 Certification Study Guide, DB2 DBA, DB2 9]]></keywords>
			<blurb><![CDATA[Advice on evaluating the many DB2 certification books on the market. ]]></blurb>
			<authors><![CDATA[Howard Fosdick]]></authors>
			<body><![CDATA[
			
					
<P>
<img src="http://i.cmpnet.com/v2.db2mag.com/columns/howard_fosdick.jpg" alt="Howard_Fosdick" width="90" height="90" class="Image_Float-Left" border="1" />
Along with browsing Web resources, trying practice tests, and getting hands-on experience, careful study of one or more DB2 books is essential to passing DB2 certification exams. But how do you choose the right book? </p>
<P>

Many online resources list DB2 certification books and offer detailed reviews and summaries to help you decide which to pick.</p>
<P>

I recently updated an article I wrote for developerWorks (called <a href="http://ibm.com/developerworks/db2/library/techarticle/dm-0401fosdick">"DB2 Certification: Everything You Need to Know"</a>) to include a table that shows DB2 9 certification study guides by exam. Each title links to the book's Amazon entry. Look beyond the "star rating" to the customer-written book reviews. You'll sometimes find certification hints among the reviews (most candidates evaluate the books after taking the exams). And user comments and descriptions may help you choose among alternatives.</p><div class="Article_Sidebar_Float-Right" id="Article_Sidebar"> 
<h2>Browsing the Aisles</h2>
<P>

DB2 books are available from many online sources. Start your comparison shopping with this handy reference.</p>
<h3><em><a href="http://ibmdatabasemag.com/bookstore">IBM Database Magazine</em> Bookstore</a></h3>
<P>

Find lists of certification study guides, books by <em>IBM Database Magazine</em> authors, and print and electronic books on many DB2 and IT-related topics. </p>
<ul>
  <li><a href="http://ibmdatabasemag.com/bookstore/certification.shtml">Certification guides</a>
    </li>
  <li><a href="http://ibmdatabasemag.com/bookstore/db2authors.shtml">Books by <em>IBM Database Magazine</em> authors</a></li>
  <li><a href="http://ibmdatabasemag.com/electronicbooks">Electronic books</a>
    </li>
  <li><a href="http://ibmdatabasemag.com/bookstore/top_infomngmt2008.shtml">Top books of 2008</a></li>
</ul>
<h3><a href="http://ibm.com/software/data/education/bookstore">IBM Information Management Bookstore</a></h3>
<P>

Find study guides on various DB2 versions plus general information management topics. </p>
<ul>
  <li><a href="http://ibm.com/software/data/education/bookstore/certify.html">Certification guides</a></li>
  <li><a href="http://publib-b.boulder.ibm.com/Redbooks.nsf/Portals/Software">IBM Redbooks</a></li>
  <li><a href="http://ibm.com/software/data/sw-library">IBM Information Management Software Library</a></li>
</ul></div>
<P>

IBM's Information Management Bookstore offers another table of certification books. This list mixes books covering different DB2 versions under a single heading, so be sure you pick titles that cover the relevant exam. The list also mixes study guides &mdash; books written specifically for certification preparation &mdash; with more general books. If you plan to take an exam, you'll definitely want to buy a DB2 study guide that drills down to exam specifics. Yet it's often a good idea to also seek perspectives from books that cover more than just the exam subjects. </p>
<P>

The bookstore at ibmdatabasemag.com recommends titles and explains what each book covers. There you'll find links to certification study guides, IBM Redbooks, and IBM's data management library.</p>
<P>

When deciding which books to buy for DB2 certification, consider several factors:</p>
<ul>
  <li><strong>Quality. </strong>Amazon's reader reviews and star ratings are useful. Most DB2 study guides are well received by readers, so look through reviews for clues as to which books include elements you consider important.</li>
  <li><strong>Sample tests.</strong> You need to take practice tests to prepare for the exams; therefore, books that include tests offer a big advantage. </li>
  <li><strong>Coverage. </strong>You don't want to overlook any test topics. Purchase at least one study guide and you won't omit anything.</li>
</ul>
<P>

Once you've got the books in hand, all you have left is the challenging part: the studying. </p>
<hr width="60%" />
<P>

<P>
<em>Howard Fosdick</em></a><em> is an IBM-certified DB2 DBA. His book</em> Rexx: Programmer's Reference <em>covers everything Rexx, including database programming, and is available at <a href="http://amazon.com/rexx">amazon.com/rexx</a>.</em></p>
<P>

				]]></body>
		</item>
	
		<item>
			<title><![CDATA[News Bytes]]></title>
			<link><![CDATA[http://ibmdatabasemag.com/story/showArticle.jhtml?articleID=216300363&cid=RSSfeed]]></link>
			<description><![CDATA[IBM and partner news about information management products and solutions. ]]></description>
			<pubDate>Tue, 31 Mar 2009 17:00:05 EDT</pubDate>
			<keywords><![CDATA[DB2 pureXML, Cloud Computing, Informix Dynamic Server, DB2 9, Informix, UniVerse]]></keywords>
			<blurb><![CDATA[IBM and partner news about information management products and solutions. ]]></blurb>
			<authors><![CDATA[]]></authors>
			<body><![CDATA[
			
					<h2><a href="http://ibm.com/developerworks/spaces/cloud" target="_blank">DB2 and Informix Head for the Cloud </a></h2>
<P>

IBM is now offering software on a "pay as you go" basis through Amazon Web Services (AWS). The new model provides access to development and production instances of IBM DB2, Informix Dynamic Server, WebSphere Portal, Lotus Web Content Management, WebSphere sMash, and Novell's SUSE Linux operating system software in the Amazon Elastic Compute Cloud (Amazon EC2) environment.<br/><br />New Amazon Machine Images (AMIs) are available at no charge for development and test purposes, enabling software developers to quickly build preproduction applications based on IBM software within Amazon EC2. The new portfolio will extend over time to include service management capabilities from IBM Tivoli software for Amazon EC2 for better control of and more automation in dynamic infrastructures in the cloud.</p>
<P>

The IBM software images for full production running in Amazon EC2 will be launched in beta in the coming months, with pricing to be announced. All developers and customers will have the operational capability to run development and production instances of IBM software for an hourly price per instance. Additionally, customers will also be able to run their already-purchased IBM software on Amazon EC2. </p>
<P>

For details or to access the new development AMIs, go to the new <a href="http://ibm.com/developerworks/spaces/cloud" target="_blank">IBM Cloud Space</a> on developerWorks or the IBM section of the <a href="http://aws.amazon.com/solutions/featured-partners/ibm" target="_blank">AWS Featured Partners page</a>. </p>
<hr/>
<h2>Data Servers and Tools</h2>
<h3><a href="http://ibm.com/informix/warehouse" target="_blank">Informix Dynamic Server Gets New Data Warehouse Features</a></h3>
<P>

New capabilities for Informix Dynamic Server (IDS) let companies combine warehouse and operational data in the same platform. Combining these capabilities in an existing Informix infrastructure enables Informix customers, a group that includes eight of the top 10 U.S. retailers and 95 percent of global telecommunications companies, to maximize opportunities while reducing expenses. Companies can use a single database for both transactions and analytics or build a separate data warehouse, depending on workload requirements. </p>
<P>

Customers in all industries, including healthcare, retail, and manufacturing, can take advantage of this instantaneous access to information with analytical capabilities to understand market and customer trends. For example, the new software can help retailers easily access and organize data from store, Web, and catalog sales and from internal inventory and merchandising systems to help analyze future buying trends in order to make smarter business decisions.</p>
<P>

With the new feature, Informix users can build end-to-end business intelligence and reporting solutions using data from various sources, including IDS. They can use front-end analysis and reporting tools, like IBM Cognos, or develop mashups and other dashboards. </p>
<P>

The Informix Warehouse Feature includes the SQL Warehouse (SQW) tool that has been integrated with IDS V11.50. SQW includes the following components:</p>
<ul>
  <li>Design Studio</li>
  <li>SQL Warehousing Tool</li>
  <li>Warehouse Administration Console.</li>
</ul>
<P>

For details, go to <a href="http://ibm.com/informix/warehouse" target="_blank">ibm.com/informix/warehouse</a>.</p>
<h3><a href="http://ibm.com/common/ssi/cgi-bin/ssialias?infotype=an&amp;subtype=ca&amp;appname=Demo&amp;htmlfid=897/ENUS909-049" target="_blank">DB2 and Informix Packaging Changes</a> </h3>
<P>

IBM is simplifying the packaging of its DB2 and Informix Dynamic Server (IDS) database software. Several paid features such as DB2 pureXML and spatial features are moving into the core DB2 and IDS editions without changing the price of the core editions. By simplifying the packaging and pricing of DB2, IDS, and InfoSphere Warehouse (powered by DB2), IBM is making it easier for customers to manage their licenses and plan their budgets.</p>
<P>

For details, go to <a href="http://ibm.com/common/ssi/cgi-bin/ssialias?infotype=an&amp;subtype=ca&amp;appname=Demo&amp;htmlfid=897/ENUS909-049" target="_blank">ibm.com/common/ssi/cgi-bin/ssialias?infotype=an&amp;subtype=ca&amp;appname=Demo&amp;htmlfid=897/ENUS909-049</a>.</p>
<P>
<h3><a href="http://ibm.com/software/data/u2/universe/universe10-3.html" target="_blank">IBM UniVerse 10.3 Launches </a></h3>
<P>

IBM UniVerse 10.3 adds powerful development capabilities to the nested relational (multivalue) relational database management system. New features include:</p>
<ul>
  <li>The Eclipse-based IBM UniData and UniVerse Basic Developer Toolkit (BDT), an integrated development and debugging tool for Basic developers </li>
  <li>The UniObjects for .NET Compact Framework (UO.NET for CF), which allows developers to deliver secure downloadable applications on the Windows CE platform</li>
  <li>Streamlined installation</li>
  <li>Improved usability and searchability of the UniVerse readme</li>
  <li>Support for multiple languages through enhancements to XML encoding</li>
  <li>Improvements to file tools</li>
  <li>Enhancements to transaction logging functionality</li>
  <li>Certification for Windows 2008.<br />
    For details, go to <a href="http://ibm.com/software/data/u2/universe/universe10-3.html" target="_blank">ibm.com/software/data/u2/universe/universe10-3.html</a>. </li>
</ul>
<h3><a href="http://ibm.com/db2/technology-sandbox" target="_blank">IBM Unveils the DB2 Technology Sandbox</a> </h3>
<P>

IBM recently opened the DB2 Technology Sandbox, which provides early access to technology under development for future versions of DB2. The sandbox is a great opportunity to download and play with new product features that are under development, provide feedback to help shape these technologies, and get access to the latest information from the IBM labs. </p>
<P>

The DB2 Technology Sandbox currently includes enhancements to: </p>
<ul>
  <li>compression </li>
  <li>manageability </li>
  <li>security </li>
  <li>performance </li>
  <li>DB2 pureXML. </li>
</ul>
<P>

VMware is offering partners in the DB2 Technology Sandbox more than $14,000 worth of VMware software licenses and technical support for free. </p>
<P>

For details, go to <a href="http://ibm.com/db2/technology-sandbox" target="_blank">ibm.com/db2/technology-sandbox</a>. </p>
<P>
<h3><a href="http://ibm.com/common/ssi/cgi-bin/ssialias?infotype=an&amp;subtype=ca&amp;appname=Demo&amp;htmlfid=897/ENUS209-043" target="_blank">DB2 and Informix Extend Virtualization Support</a> </h3>
<P>

IBM now supports virtualization across its entire database software portfolio, from entry-level through enterprise databases, from departmental data marts through dynamic enterprise warehouses. All DB2 and Informix editions can now be used, licensed, and supported in virtualized environments. IBM is the first major database software vendor to announce support for virtualization across the entire spectrum of database software. </p>
<P>

For details, go to <a href="http://ibm.com/common/ssi/cgi-bin/ssialias?infotype=an&amp;subtype=ca&amp;appname=Demo&amp;htmlfid=897/ENUS209-043" target="_blank">ibm.com/common/ssi/cgi-bin/ssialias?infotype=an&amp;subtype=ca&amp;appname=Demo&amp;htmlfid=897/ENU09-043</a>.</p>
<P>
<h3><a href="http://ibm.com/db2/express/download.html" target="_blank">DB2 Express-C Shines on the Mac</a> </h3>
<P>

IBM issued a beta release of DB2 Express-C for Mac OS X in response to the growing community of developers who consider Mac OS X their operating system of choice for Ruby, Python, PHP, and other programs. </p>
<P>

To download DB2 for Mac OS X, go to <a href="http://ibm.com/db2/express/download.html" target="_blank">ibm.com/db2/express/download.html. </a></p>
<P>
<h3><a href="http://hitsw.com" target="_blank">HiT Software's Ritmo Boosts .NET Enterprise Productivity</a> </h3>
<P>

Ritmo .NET data providers for IBM DB2/IBM i databases improved developer productivity at Seaboard Marine and Morgan Corp., according to HiT Software. </p>
<P>

Ocean transportation company Seaboard Marine programmers using Ritmo's C# Toolkit reduced to minutes the time it takes to test and manage DB2 data connections required by .NET applications. Morgan Corp., which manufactures dry freight and refrigerated truck bodies, uses Ritmo to provide the fastest possible performance between their IBM i data and a Windows server to support a business-critical, enterprise-wide information portal. </p>
<P>

HiT's Ritmo products provide 100-percent managed, high-performance access to DB2 data from Windows .NET applications. </p>
<P>

For details, go to <a href="http://hitsw.com" target="_blank">hitsw.com</a>. </p>
<h3><a href="http://datadirect.com/products/net/net_for_db2" target="_blank">DataDirect Connect for ADO.NET</a></h3>
<P>

The latest release of DataDirect Connect for ADO.NET lets developers write and deploy secure, efficient .NET enterprise applications that connect to multiple databases, including DB2. This version includes new features and tuning options for each major database and introduces DataDirect Bulk Load, a flexible, common API-based implementation of bulk-load functionality for the .NET platform that provides consistent semantics across all supported databases. </p>
<P>

DataDirect's suite of ADO.NET data providers use a 100-percent managed code architecture, eliminating the need for database client libraries. Because managed code runs in the Common Language Runtime (CLR) environment, it reduces risks and closes holes that unmanaged code leaves exposed. </p>
<P>

For details, go to <a href="http://datadirect.com/products/net/net_for_db2">datadirect.com/products/net/net_for_db2.</p>
<h3><a href="http://imperva.com" target="_blank">SecureSphere Database Gateway for z/OS</a> </h3>
<P>

The new SecureSphere Database Gateway for z/OS (DGZ) from application data security provider Imperva provides comprehensive monitoring, auditing, and protection for DB2 databases running on z/OS systems. Imperva DGZ monitors local and network activity by privileged users, nonprivileged users, and applications to prevent data loss and fraud and to automate regulatory compliance reporting. System z environments, which often support transactional and financial enterprise applications, present unique security and audit challenges because they can't afford downtime or system latency. SecureSphere DGZ provides comprehensive security and an ironclad audit trail covering all paths into and out of DB2 databases on the z/OS platform. Network activity is captured directly by SecureSphere, while local activity by privileged users and administrators is collected through the integration of IBM Audit Management Expert (AME), a native tool specifically developed for IBM z/OS environments. </p>
<P>

For details, go to <a href="http://imperva.com" target="_blank">imperva.com</a>.</p>
<h3><a href="http://staranalytics.com" target="_blank">Star Integration Server's New DB2 Support</a></h3>
<P>

Star Analytics' latest release of Star Integration Server 2.5 offers new support for IBM DB2 Version 8.2 or higher. A simple,cost-effective, automated method for extracting and sharing data from finance applications with relational data stores and business intelligence applications, the Star Integration Server with DB2 connectivity can be a vital component in the fast enablement of Cognos reporting for users of Oracle Hyperion technology.</p>
<P>

For details, go to <a href="http://staranalytics.com" target="_blank">staranalytics.com</a>.</p>
<P>
<h3><a href="http://tripwire.com" target="_blank">Tripwire Supports IBM i5/OS and DB2</a></h3>
<P>

Tripwire Inc. announced support for the IBM i5/OS operating system and IBM DB2 data servers in its Tripwire Enterprise product, which provides change control. IBM i5/OS (formerly known as AS/400) and DB2 are widely used in infrastructure subject to Payment Card Industry (PCI) Data Security Standards as well as many other regulatory requirements. With support for IBM i5/OS, Tripwire Enterprise now provides an end-to-end configuration solution for the platform used by over 16,000 banks worldwide and by 95 percent of Fortune 100 companies.</p>
<P>

Using Tripwire to ensure that the proper processes and compliance reporting are in place helps create a single, integrated environment for continuous compliance. Tripwire Enterprise immediately alerts IT staff to change and release management policy exceptions, allowing them to be investigated and resolved. </p>
<P>

For details, go to <a href="http://tripwire.com" target="_blank">tripwire.com</a>. </p>

				
					
<h3><a href="http://intellimagic.net" target="_blank">RMF Magic Version 5 Adds DB2 Support</a> </h3>
<P>

Version 5 of IntelliMagic's RMF Magic disk-performance analysis product now supports DB2 as a database repository. The DB2 database can be on z/OS or on Windows (DB2 Express). New functions in RMF Magic Version 5 include views from the application perspective, showing which jobs and storage groups generate most of the I/O load and which applications are affected by storage delays. Back-end access density reporting helps administrators select the best storage devices for each application, from high-performance to low-cost drives. </p>
<P>

For details, go to <a href="http://intellimagic.net" target="_blank">intellimagic.net</a>.</p>
<h3><a href="http://opentechsystems.com" target="_blank">DR/Xpert for DB2</a></h3>
<P>

The new release of DR/Xpert for DB2 from OpenTech Systems now automates the backup and recovery for DB2 system objects including the directory and catalog in addition to user data. To automatically ensure recoverability, DR/Xpert for DB2 audits the DB2 catalog and builds the jobs to drive DB2 utilities from IBM, BMC, or CA to backup and recover user data and system objects. DR/Xpert for DB2 also performs intelligent grouping of DB2 objects and determines the most efficient object image copy and recovery. In addition, DR/Xpert for DB2 supports user-defined recovery groups so DB2 objects can be grouped together and assigned a recovery priority. </p>
<P>

For details, go to <a href="http://opentechsystems.com" target="_blank">opentechsystems.com</a>.</p>
<h3><a href="http://ibm.com/press/us/en/pressrelease/26478.wss" target="_blank">Kookmin Bank Selects IBM System z10 and DB2</a> </h3>
<P>

Korea's largest banking institution, Kookmin Bank, will adopt IBM System z10 as the platform for a next-generation core banking application system that will consolidate all of the bank's global business units. </p>
<P>

As the bank's service provider, IBM is currently developing the framework of the project and building a customer-centric, user-friendly application system. The responsive and flexible IT infrastructure will provide 24x7 availability and allow the banks to respond to the changing financial environment promptly. Other benefits of the system include business continuity based on a seamless disaster recovery solution, minimum deployment and stronger monitoring, as well as improved support for the bank's global operations. In turn, the bank will be able to leverage the new technology, devices, and media to provide world-class services to its customers globally. </p>
<P>

For more information about IBM, go to <a href="http://ibm.com/press/us/en/pressrelease/26478.wss" target="_blank">ibm.com/press/us/en/pressrelease/26478.wss.</a></p>
<h3><a href="http://www.relarc.com/smartrestart/" target="_blank">RAI Smart/RESTART V.10 </a></h3>
<P>

Relational Architects International (RAI) announced Smart/RESTART Version 10.1 with full support for z/OS V1.10 and the latest DFSMS facilities (such as extended addressing volumes). </p>
<P>

Smart/RESTART is a robust solution that conserves the batch window by enabling failing batch applications to resume from a checkpoint rather than rerun from the beginning. Jobs can be resubmitted for restart without JCL changes. The product's comprehensive commit scope guarantees that changes to DB2, WebSphere MQ, IMS, and other RRS compliant resources stay in sync with a program's sequential file and cursor position, working storage, and random VSAM updates. </p>
<P>

Smart/RESTART supports both new and existing applications, and is compatible with production control software and interactive debugging tools.</p>
<P>

For details, go to <a href="http://www.relarc.com/smartrestart" target="_blank">relarc.com/smartrestart</a>.</p>
<hr/>
<P>
<h2>Business Intelligence</h2>
<h3><a href="http://ibm.com/press/us/en/pressrelease/26614.wss" target="_blank">Elie Tahari Uses IBM Technology to Lift Profits</a></h3>
<P>

Global fashion design company Elie Tahari is using IBM business intelligence technology to gain greater visibility into buying habits, merchandising, and its supply chain. Since implementing the system, the retailer has boosted sales by more than 10 percent while cutting operating costs. </p>
<P>

Elie Tahari's collection, carried in 40 countries and in more than 600 U.S. stores, changes throughout the year. Better supply chain insight helps it keep up with customer demand based on the season's trends and geography. With the new business intelligence solution, Elie Tahari has been able to achieve more than 30 percent savings in managing their supply chain and the transfer of merchandise from warehouse to the stores. To date, the Cognos BI solution has helped Elie Tahari employees gain better visibility into all of their critical business information, such as trends around customer orders. In addition, the IBM software also helped reduce the risk of manual reporting errors by moving to a new electronic system. In the past, it would take days for information to be manually compiled; that information is now updated every five minutes and can be viewed by all departments across the company, irrespective of location in any part of the world. </p>
<P>

For details, go to <a href="http://ibm.com/press/us/en/pressrelease/26614.wss" target="_blank">ibm.com/press/us/en/pressrelease/26614.wss</a>.</p>
<h3><a href="http://sterlingcommerce.com" target="_blank">Sterling Commerce and IBM Partner to Transform Retail</a> </h3>
<P>

Sterling Commerce, an AT&amp;T Inc. company, selected IBM as a global partner to help clients in retail and other industries simplify their IT infrastructures, reduce costs, and transform business processes. Sterling Commerce and IBM are now defining go-to-market projects based on several IBM industry frameworks to be supported by Sterling Commerce software solutions. </p>
<P>

The Sterling Selling and Fulfillment Suite is now validated with the IBM Retail Integration Framework, an SOA-based enterprise-software platform. </p>
<P>

The companies will extend their retail experience to other industries using additional IBM industry Frameworks specifically designed for the manufacturing, financial services, communications and media, and entertainment sectors. </p>
<P>

Sterling Selling and Fulfillment Suite and Sterling Commerce integration solutions for managed file transfer and business-to-business connectivity and collaboration are now enabled on IBM's System p servers, DB2, and WebSphere middleware products. Sterling Selling and Fulfillment Suite also comes embedded with IBM Cognos 8 Business Intelligence. </p>
<P>

For details, go to <a href="http://sterlingcommerce.com" target="_blank">sterlingcommerce.com</a>.</p>
<hr/>
<P>
<h2>Content Management </h2>
<h3><a href="http://www.ibm.com/think" target="_blank">IBM to Make Health Records Smarter</a></h3>
<P>

IBM is helping more than 1,000 hospitals worldwide implement smarter healthcare systems for ensuring patient safety, improving efficiency, and reducing medical errors through electronic medical records (EMRs). The new EMR systems, built on IBM open technology for integrating and managing medical data, can provide medical personnel instant access to pertinent information. </p>
<P>

Memorial Hermann Hospital System (MHHS) adopted IBM software and services in concert with IBM Business Partner CGI's Sovera solution to provide convenient, 24x7 Web-based access to MHHS patients' financial information, such as copies of patients' insurance cards, which are scanned into the system at registration sites across more than 25 locations. Similarly, the Health Information Management department uses a high-volume, centralized scanning center to capture the full range of clinical documentation related to patient care. In just 20 months, MHHS realized more than $1.2 million in operational cost savings. </p>
<P>

IBM also launched a new suite of healthcare information sharing and analytics technologies at the Guang Dong Hospital of Traditional Chinese Medicine (TCM). The pioneering system, dubbed CHAS (Clinical and Health Records Analytics and Sharing), is designed to enable the sharing of EMRs that incorporate TCM and modern Western medicine data across the hospital network. Central to the solution is a standardized terminology system that enables efficient sharing of information across different departments of the hospital and, eventually, outside the hospital to other healthcare facilities. By integrating health records that combine Eastern and Western medicine into one standardized system and applying sophisticated analytics, CHAS can also provide a way for healthcare practitioners to better understand which treatment plans and techniques from each approach work best for specific diseases and medical conditions. </p>
<P>

For details, go to <a href="http://www.ibm.com/think" target="_blank">www.ibm.com/think</a>.</p>
<hr/>
<P>
<h2>Business Intelligence</h2>
<h3><a href="http://ibm.com/press/us/en/pressrelease/26614.wss" target="_blank">Elie Tahari Uses IBM Technology to Lift Profits</a></h3>
<P>

Global fashion design company Elie Tahari is using IBM business intelligence technology to gain greater visibility into buying habits, merchandising, and its supply chain. Since implementing the system, the retailer has boosted sales by more than 10 percent while cutting operating costs. </p>
<P>

Elie Tahari's collection, carried in 40 countries and in more than 600 U.S. stores, changes throughout the year. Better supply chain insight helps it keep up with customer demand based on the season's trends and geography. With the new business intelligence solution, Elie Tahari has been able to achieve more than 30 percent savings in managing their supply chain and the transfer of merchandise from warehouse to the stores. To date, the Cognos BI solution has helped Elie Tahari employees gain better visibility into all of their critical business information, such as trends around customer orders. In addition, the IBM software also helped reduce the risk of manual reporting errors by moving to a new electronic system. In the past, it would take days for information to be manually compiled; that information is now updated every five minutes and can be viewed by all departments across the company, irrespective of location in any part of the world. </p>
<P>

For details, go to <a href="http://ibm.com/press/us/en/pressrelease/26614.wss" target="_blank">ibm.com/press/us/en/pressrelease/26614.wss</a>.</p>
<h3><a href="http://sterlingcommerce.com" target="_blank">Sterling Commerce and IBM Partner to Transform Retail</a> </h3>
<P>

Sterling Commerce, an AT&amp;T Inc. company, selected IBM as a global partner to help clients in retail and other industries simplify their IT infrastructures, reduce costs, and transform business processes. Sterling Commerce and IBM are now defining go-to-market projects based on several IBM industry frameworks to be supported by Sterling Commerce software solutions. </p>
<P>

The Sterling Selling and Fulfillment Suite is now validated with the IBM Retail Integration Framework, an SOA-based enterprise-software platform. </p>
<P>

The companies will extend their retail experience to other industries using additional IBM industry Frameworks specifically designed for the manufacturing, financial services, communications and media, and entertainment sectors. </p>
<P>

Sterling Selling and Fulfillment Suite and Sterling Commerce integration solutions for managed file transfer and business-to-business connectivity and collaboration are now enabled on IBM's System p servers, DB2, and WebSphere middleware products. Sterling Selling and Fulfillment Suite also comes embedded with IBM Cognos 8 Business Intelligence. </p>
<P>

For details, go to <a href="http://sterlingcommerce.com" target="_blank">sterlingcommerce.com</a>.</p>
<hr/>
<P>
<h2>Content Management </h2>
<h3><a href="http://www.ibm.com/think" target="_blank">IBM to Make Health Records Smarter</a></h3>
<P>

IBM is helping more than 1,000 hospitals worldwide implement smarter healthcare systems for ensuring patient safety, improving efficiency, and reducing medical errors through electronic medical records (EMRs). The new EMR systems, built on IBM open technology for integrating and managing medical data, can provide medical personnel instant access to pertinent information. </p>
<P>

Memorial Hermann Hospital System (MHHS) adopted IBM software and services in concert with IBM Business Partner CGI's Sovera solution to provide convenient, 24x7 Web-based access to MHHS patients' financial information, such as copies of patients' insurance cards, which are scanned into the system at registration sites across more than 25 locations. Similarly, the Health Information Management department uses a high-volume, centralized scanning center to capture the full range of clinical documentation related to patient care. In just 20 months, MHHS realized more than $1.2 million in operational cost savings. </p>
<P>

IBM also launched a new suite of healthcare information sharing and analytics technologies at the Guang Dong Hospital of Traditional Chinese Medicine (TCM). The pioneering system, dubbed CHAS (Clinical and Health Records Analytics and Sharing), is designed to enable the sharing of EMRs that incorporate TCM and modern Western medicine data across the hospital network. Central to the solution is a standardized terminology system that enables efficient sharing of information across different departments of the hospital and, eventually, outside the hospital to other healthcare facilities. By integrating health records that combine Eastern and Western medicine into one standardized system and applying sophisticated analytics, CHAS can also provide a way for healthcare practitioners to better understand which treatment plans and techniques from each approach work best for specific diseases and medical conditions. </p>
<P>

For details, go to <a href="http://www.ibm.com/think" target="_blank">www.ibm.com/think</a>.</p>
<hr/>
<P>
<h2>Conferences And Events</h2>
<h3>April</h3>
<P>

<a href="http://edw2009.wilshireconferences.com" target="_blank"><strong>Enterprise DataWorld</a><br>
</strong>April 5-9<br>
Tampa, Fla.<br>
<a href="http://edw2009.wilshireconferences.com">edw2009.wilshireconferences.com</p>
<P>

<strong><a href="http://iiug.org/conf/2009/iiug" target="_blank">International Informix Users Group Conference</a><br>
</strong>April 26-29<br>
Overland Park, Kan.<br>
<a href="http://iiug.org/conf/2009/iiug" target="_blank">iiug.org/conf/2009/iiug</a></p>
<P>

<strong><a href="http://www.gartner.com/it/page.jsp?id=676310" target="_blank">Gartner Risk Management &amp; Compliance Summit</a><br>
</strong>April 29 - May 1<br>
Chicago<br>
<a href="http://www.gartner.com/it/page.jsp?id=676310" target="_blank">www.gartner.com/it/page.jsp?id=676310</a></p>
<h3>May</h3>
<P>

<strong><a href="http://tdwi.org/display.aspx?id=9300" target="_blank">TDWI World Conference Spring 2009</a><br>
</strong>Chicago<br>
May 3-8<br>
<a href="http://tdwi.org/display.aspx?id=9300" target="_blank">tdwi.org/display.aspx?id=9300</a></p>
<P>

<strong><a href="http://2009.idug.org/na" target="_blank">IDUG 2009 - North America</a><br>
</strong>May 11-15<br>
Denver<br>
<a href="http://2009.idug.org/na" target="_blank">2009.idug.org/na</a></p>
<P>

<strong><a href="http://cognos.com/cognosforum" target="_blank">IBM Cognos Forum</a><br>
</strong>May 12-15<br>
Orlando, Fla.<br>
<a href="http://cognos.com/cognosforum" target="_blank">cognos.com/cognosforum</a></p>
<P>

<strong><a href="http://cloudsummit.com" target="_blank">Enterprise Cloud Summit</a><br>
</strong>May 18-19<br>
Las Vegas<br>
<a href="http://cloudsummit.com" target="_blank">Enterprise Cloud Summit</a></p>
<h3>June</h3>
<P>

<strong><a href="http://debtechint.com/dg2009" target="_blank">Data Governance Conference</a><br>
</strong>June 1-4<br>
San Diego<br>
<a href="http://debtechint.com/dg2009" target="_blank">debtechint.com/dg2009</a></p>
<P>

<strong><a href="http://ibm.com/software/uk/data/conf" target="_blank">Information on Demand Conference - EMEA</a><br>
</strong>June 2-5<br>
Berlin<br>
<a href="http://ibm.com/software/uk/data/conf" target="_blank">ibm.com/software/uk/data/conf</a></p>
<P>

<strong><a href="http://linkeddataplanet.com" target="_blank">LinkedData Planet</a><br>
</strong>June 17-18<br>
New York<br>
<a href="http://linkeddataplanet.com" target="_blank">linkeddataplanet.com</a></p>
<h3>October</h3>
<P>

<strong><a href="http://ibm.com/software/data/conf" target="_blank">IBM Information On Demand 2009 Global Conference</a><br>
</strong>October 25-29<br>
Las Vegas <br />
<a href="http://ibm.com/software/data/conf" target="_blank">ibm.com/software/data/conf</a></p>
<P>

				]]></body>
		</item>
	
		<item>
			<title><![CDATA[Dream Job: Taking the Helm]]></title>
			<link><![CDATA[http://ibmdatabasemag.com/story/showArticle.jhtml?articleID=216300365&cid=RSSfeed]]></link>
			<description><![CDATA[DB2 DBA Jeffrey Benner is about to leave the familiar behind to embark on a new adventure.]]></description>
			<pubDate>Tue, 31 Mar 2009 17:00:04 EDT</pubDate>
			<keywords><![CDATA[DB2 How-To Blog, Jeffrey Benner, Twitter]]></keywords>
			<blurb><![CDATA[DB2 DBA Jeffrey Benner is about to leave the familiar behind to embark on a new adventure.]]></blurb>
			<authors><![CDATA[Kim Moutsos]]></authors>
			<body><![CDATA[
			
					
<P>
<img src="http://i.cmpnet.com/ibmdatabasemag/2009-issue1/dreamjob.jpg" alt="Dream Job: Taking the Helm" class="Image_Float-Left" border="1" width="250">In 2009, Jeffrey Benner's language skills (near fluency in Hindi) and the book he wrote about Indian foreign policy (products of a study abroad year through his university in the early '80s) seem like carefully thought out career moves. In fact, they were carefully thought out moves, just for an entirely different career. </p>
<P>

Today, the Chicago-based IT professional works as a DB2 DBA contractor for global online travel company Orbitz. Over the past<br /><br /> 12 years, he has held similar contract positions at E.piphany, TransUnion, Blue Cross &amp; Blue Shield of Illinois, Northern Trust, Caremark, Bradford Exchange, Montgomery Wards, and other companies. He also carves out time to write the increasingly popular <a href="http://www.ebenner.com/db2dba_blog/" target="_blank">DB2 DBA How-To blog</a> (an expression of his dedication to documentation and communication) and regularly Twitters on DB2 and other IT subjects (he's <a href="http://twitter.com/ebennerdotcom" target="_blank">ebennerdotcom</a>).</p>
<P>

Yet Benner earned his academic degrees (a B.A. from the University of Wisconsin, Madison, in 1983 and an M.A. from the University of Chicago in 1985) in international relations. Had he followed his original plan, he would have gone on for a Ph.D. and taken a teaching position at a university. </p>
<P>

Academia, though, felt restrictive. "I could have ended up in Iowa with no chance to travel," Benner says, an unappealing prospect to a person who, as a child in rural Indiana, spent hours imagining all the places the train tracks that ran through his parents' property could carry him. </p>
<P>

But the real problem was the lack of a disciplined science in the field. One of his early career inspirations came from sci-fi great Isaac Asimov's <em>Foundation</em> series; specifically, Benner says, the idea of applying an algorithm to analyze human behavior and predict the future. In reality, he found the field to be, as he puts it, "pretty much B.S."</p>
<P>

The resulting career crisis led to four years in financial services positions that he disliked. Then, he tried his hand as a COBOL programmer at Management Data Communications during the late 1980s, when companies would hire and train people with no programming background.</p>
<P>

Everything clicked. The self-proclaimed geek found himself surrounded by like-minded, intelligent coworkers. He loved it all &mdash; the clunky dumb terminals, the mainframe programming, and the pay ("I couldn't imagine making that much for having so much fun," he says).</p>
<P>

After picking up CICS skills, he taught himself DB2 for OS/390 programming. Seeing the potential for client/server, he learned OS/2 while working for the Loyola University Medical Center, then added Micro Focus COBOL skills, which he drew on to land a position at the Chicago Mercantile Exchange in 1992. His three-year stint there proved to be, by his choice, the last job in which he worked directly for an employer (rather than a broker). His first project using non-mainframe DB2 came in the late 1990s, when he agreed to learn DB2 for AIX to help Montgomery Wards develop a better inventory system.</p>
<P>

Benner says working as a contractor motivates him to keep his skills and personal presentation sharp ("You're always aware you're about to be thrown back into the water," as he puts it). Plus, the end of each contract presents an opportunity to take time off for travel, writing, and other pursuits. And, he says, the pay is better than he'd make as a regular employee. </p>
<P>

One benefit he's yet to accrue is the freedom to take positions anywhere in the world. After deferring that option to give his children a stable home, Benner is ready to cash it in. With his contract ending, his daughter graduating, and his step-daughter in college, he and his wife are selling their home in the Chicago suburbs in June and looking for new positions anywhere in the world. Benner hopes to find one that makes use of his familiarity with the Indian language and culture.</p>
<P>

The couple is targeting both U.S. coasts, Canada, India, Australia, and Europe in their search. But this former foreign relations student will gladly go anywhere opportunity leads him. </p>
				]]></body>
		</item>
	
		<item>
			<title><![CDATA[Asked and Answered: DB2 Storage Questions ]]></title>
			<link><![CDATA[http://ibmdatabasemag.com/story/showArticle.jhtml?articleID=216300372&cid=RSSfeed]]></link>
			<description><![CDATA[Advice on DB2 storage specifics such as LUNs and the number of disks per container.]]></description>
			<pubDate>Tue, 31 Mar 2009 17:00:03 EDT</pubDate>
			<keywords><![CDATA[Logical Unit Numbers, DB2 Storage, Roger E. Sanders, Distributed DBA, Prefetch Size, LUNs]]></keywords>
			<blurb><![CDATA[Advice on DB2 storage specifics such as LUNs and the number of disks per container.]]></blurb>
			<authors><![CDATA[Roger E. Sanders]]></authors>
			<body><![CDATA[
			
					<img src="http://i.cmpnet.com/v2.db2mag.com/columns/sanders_roger.jpg" alt="Roger Sanders" width="90" height="90" class="Image_Float-Left" border="1" />
 <h3>Q: How do I spread 150GB over Logical Unit Numbers (LUNs)? Would it be a problem if a storage person gave me only one directory (mount point) on one LUN? If the data is to be spread over many LUNs, how do I calculate the optimal number of LUNs (2, 3, or 4)?</h3>
<P>

<strong>A:</strong> Don't confuse LUNs with physical disks; a single LUN can span multiple disks, and you can have multiple LUNs on the same set of physical disks. Generally, more LUNs provide better performance, but fewer larger LUNs are sometimes easier to manage. The idea is to spread the workload across the appropriate number of physical disk spindles.</p>
<P>

When deciding on the appropriate number of disks to use, you should first look at the database's I/O requirements. A traditional disk drive can deliver between 150 and 170 I/O per second (IOPS). By design, intelligent storage systems mask the physical limitations of disk drives by utilizing cache, providing parallel access to disks, and using proprietary algorithms to optimize read and write operations. As a result, most storage systems are capable of delivering more IOPS than the sum of the individual disks. However, it is a good idea to err on the side of caution by estimating 150 to 170 IOPS per disk for IOPS calculations. Therefore, if you determine that your database needs 1,500 IOPS, you would need to distribute it across 10 physical drives. </p>
<P>

DB2 experts at IBM recommend using between eight and 20 disk spindles per CPU to avoid heavy I/O wait; CPU type, IOPS, and throughput requirements determine how many spindles are actually needed. They also recommend creating one LUN on the disks used, and a single file system on the LUN itself.</p>
<h3>Q: To calculate prefetch size, you have to know the number of disks per container. According to the manual, the prefetch size is calculated as follows: <br />
<code>Prefetch Size = (no of containers) * (no of disks per container) * extend size.</code></h3>
<P>

By default, the number of disks per container is six. I previously have used a 9GB RAID 5 (5+1) configuration system. In this system, the number of disks per container is five. What should I be asking my storage person to use as the correct value for number of disks per container?</p>
<P>

<strong>A:</strong> If your version of DB2 supports it, set the prefetch size to <code>AUTOMATIC</code> and forget it. (You can also assign the value <code>AUTOMATIC</code> to the <code>dft_prefetch_sz</code> database configuration parameter.) However, you do need to make sure the <code>DB2_PARALLEL_IO</code> registry variable is set appropriately if you go this route. You set this by executing a command that looks like this: </p>
<P>

<code>db2set DB2_PARALLEL_IO=&#91;TS_ID&#93;:&#91;DisksPerCtr&#93;,...</code></p>
<P>

where:</p>
<ul>
  <li><em>TS_ID</em> identifies one or more individual table spaces by their numeric ID. An asterisk (*) indicates all table spaces</li>
  <li><em>DisksPerCtr</em> identifies the number of physical disks used by each table space container assigned to the table space ID specified (not including any parity disks used).</li>
</ul>
<P>

So, to set the <code>DB2_PARALLEL_IO</code> registry variable to indicate the storage containers for all table spaces, assuming they reside on a RAID 5 (7+1) group, execute a <code>db2set</code> command, such as:</p>
<P>

<code>db2set DB2_PARALLEL_IO=*:7</code></p>
<P>

(Ask your storage administrator how many physical disks are used, excluding parity disks.)</p>
<P>

Otherwise, prefetch size can be calculated by multiplying the RAID stripe size by the number of RAID devices used &mdash; not including parity (or a whole multiple of this product). The prefetch value used should also be a multiple of the extent size of the table space. </p>
<P>

<em>Special thanks to Aamer Sachedina, senior technical staff member at the IBM Toronto Lab, and Jim Wentworth, consultant corporate systems engineer at EMC Corp., for validating my responses to these questions.</em></p>
<hr width="60%"/>
<P>

<P>
<em>Roger E. Sanders</a>, a consultant corporate systems engineer at EMC Corp., is the author of 17 books on DB2 for Linux, Unix, and Windows and teaches classes at many DB2 conferences. His latest book is titled</em> DB2 9 for Linux, UNIX, and Windows Advanced Database Administration: Certification Study Guide<em> (MC Press, 2008).</em></p>
				]]></body>
		</item>
	

</channel>
</rss>

