Since the introduction of native XML support in major database management systems, the amount of business information cast in this popular format has skyrocketed. The reasons are clear: XML is unrivaled both as a data exchange format for industry standards and as a data interchange format for application developers. Most industries are developing standards for data interchange in XML, and some XML standards, like Extensible Business Reporting Language (XBRL) for the reporting of business and financial data, transcend all industries.
Yet XML's remarkable decade of growth nearly stalled due to the initial difficulty of storing and managing information in an XML format. As companies developed applications to store or retrieve XML data, they often used their existing infrastructures, including file systems and relational databases. These systems weren't designed to handle XML, and the resulting workarounds required expensive transformations with significant processing overhead. Some tried XML-only databases, only to find that segregating XML data in its own repository created yet another system that had to be separately maintained.
IBM addressed these challenges by building native XML support into DB2, allowing XML to be stored and managed in its native format together with data in relational format. This XML capability, called DB2 pureXML, shortens development time, lowers maintenance costs, and dramatically improves application performance when storing and retrieving data.
The Benefits of Native XML
There are many reasons why XML data should be stored in its native form, as the following customer scenarios show.
Simplifying the IT environment. Using DB2 pureXML, Storebrand Group, a Norwegian financial services company, reduced the lines of code for writing to and reading from its database by 65 percent. With less code to develop, test, and maintain, developers are freed to work on more productive tasks. In addition, Storebrand found that schema changes are now easier. In the past, adding just a single field took a day of work (development and testing) and a week to implement because of the processes involved with database changes. Now, developers can simply change the pointer to the schema in a DB2 XML configuration file, which takes about five minutes.
Schema changes previously required so much effort due to the nature of the workarounds created to handle XML before native storage options existed. One method, called "shredding" or "decomposition," maps XML into a tabular format. Another approach puts XML data into a single large object (LOB) cell in a table. Both approaches work, but there are significant drawbacks, especially as the amount of XML data grows.
Shredding is a popular approach for quickly retrieving individual pieces of information from the database. However, this fast query performance comes at a cost in terms of the effort required to map the XML data into a table and the processing overhead associated with inserting information into the database. This cost increases if the original XML data has to be recreated from the shredded fields.
Before XML can be shredded, a relational schema must be designed. This process can be labor-intensive, although it can be partially automated with off-the-shelf tools. However, the resulting tables will need to be carefully examined and optimized. After designing the relational schema, the environment that actually maps the XML to the relational schema must be set up. Then comes the development and testing of code for using the data (and this code is typically complex because it requires unwieldy SQL statements with multiple JOINs).
The significant overhead for shredding XML data into a relational schema is only a part of the story. XML schema changes are a fact of life, and they can play havoc with relational schemas, mapping processes, and application code. That's why many organizations realize such gains from adopting DB2 pureXML storage.
As a practical example, consider the Financial products Markup Language (FpML) industry standard protocol for complex financial products. With DB2 pureXML, dealing with FpML messages is straightforward: Just store the complete message in an XML column in a single table. However, in some implementations, using shredding to store FpML messages could require working with more than 475 separate database tables. Maintaining 475 tables is significantly more complex than managing just one.
Boosting IT productivity. The UCLA Medical Center uses DB2 pureXML to manage patient medical records, diagnostic images, and even handwritten doctor's notes. Hospital employees insert the information into the Patient Oriented Document System, which allows doctors quick access to the information to ensure high-quality patient care. By using DB2 pureXML, UCLA realizes significant productivity improvements: The time required for certain IT projects is reduced from weeks to hours.
One of the advantages of working with a DB2 pureXML repository is that the data doesn't require any special treatment or transformations before storage or retrieval. XML is stored directly in the repository and retrieved from the same location. This simplified way of working with XML data reduces the time needed for many common tasks. Such time savings are increasingly valuable in today's IT environments.
The Complete XML Toolbox
DB2 pureXML is a great advance for XML data persistence. Thanks to pureXML, many organizations are reporting significant gains in both performance and productivity.
However, when it comes to storing XML data, pureXML is just one tool in your toolbox. DB2 pureXML is probably the most important tool, but there are occasions when alternative approaches to storing XML in DB2 may prove better. Sometimes it is better to store XML data in a CLOB or BLOB. And sometimes it's better to use shredding.
IBM's Conor O'Mahoney, the creative force behind the Native XML Databases blog, explains how to make sense of the various XML options and when to use each.
Improving information integration. China Huadian Corp. (CHD), which manages more than 100 utility and financial services companies, created a flexible data analysis and reporting system built on DB2 9. The system manages data from diverse facilities, adjusting easily to new data reporting and delivery requirements, new types of industrial facilities, and the removal or addition of assets. Report data from each of CHD's branches is stored in DB2 pureXML. This setup accommodates various schemas and report formats while also making the data accessible to a variety of organizations, which improves communication between business and IT staff. DB2 pureXML makes it easy to add, update, or delete reported items. Detailed production and cost information is integrated and displayed using the company's analysis and reporting applications.
Integrating information among various business units and locations can be difficult and frustrating. A flexible, automated,
and scalable platform for information reporting makes the process much easier to manage. By building a better information reporting system, CHD gained greater business insight and agility. It also reduced cost and labor for implementing application changes and improved responses to changes in regulatory and management reporting requirements.
Supporting XML schema flexibility and evolution. One additional area to consider when evaluating XML storage options pertains to XML schemas, which define the structure of XML data. An XML schema describes the XML elements and attributes that can appear in the data, where they can appear, and how often. Validating the XML data means making sure that the XML data adheres to the rules set out in the XML schema.
XML schemas define an agreed-upon vocabulary of XML tags for a specific application scenario (such as financial trading, medical records, or insurance claims). But that vocabulary and the applications scenarios it supports can change over time, so schema flexibility and schema evolution are important. To explain the need for these features, let's look at a couple of scenarios.
Many tax authorities in the United States store information from tax forms in XML format. Tax forms change nearly every year, and changes to tax forms mean changes to the XML schema. Yet the new schema won't necessarily be the correct one for all records. Instead, records should be validated against the schema that was in existence when the record was created. In this case, schema flexibility — the ability to cater to a wide range of XML schema needs — is a key requirement. The application must have the option to validate the cells in a database column against different schemas (or to not validate against any schema).
Another common scenario involves storing messages that adhere to one of the major XML industry standards (the healthcare standard HL7 or FpML, for example). Industry standards are continually evolving; moving to a new version of an XML standard usually means a new XML schema.
Next Page >>
Comments? Questions?
Give us your feedback or ask a question of the author.