Catterall Consulting: Nuggets from DB2 by the Bay, Part 4

The last of my posts with items of information from the 2010 International DB2 Users Group North American Conference, held last month in Tampa (as in Tampa Bay), Florida.

Good DB2 9 for z/OS migration information from David Simpson. David, a senior DB2 instructor with Themis Training, described some things of which people migrating to DB2 9 should be aware. Among these are the following:

The pureXML functionality delivered in DB2 9 is quite comprehensive and opens up a lot of possibilities. One of David's colleagues at Themis figured out how to create views that make data stored in XML columns of DB2 tables look like standard relational data.

Do you have a handle on your simple tablespace situation? David reminded session attendees that simple tablespaces cannot be created in a DB2 9 environment. This being the case, it would be a good idea to move data from the simple tablespaces that you have to other tablespace types (segmented tablespaces, most likely). Sure, you can still read from, and update, a simple tablespace in a DB2 9 system, but the inability to create such a tablespace could leave you in a tough spot if you were to try to recover a simple tablespace that had been accidentally dropped (David suggested that people create a few empty simple tablespaces before migrating to DB2 9, so you'll have some available just in case you need a new one). You might think that you don't have any simple tablespaces in your DB2 Version 8 system, but you could be wrong there -- David pointed out that simple tablespaces are the default up through DB2 Version 8 (so, an implicitly-created tablespace in a pre-DB2 9 environment will be a simple tablespace).

New RRF developments. That's RRF as in reordered row format, a change (introduced with DB2 9) in the way that columns are physically ordered in a table. In the old set-up (now referred to as BRF, or basic row format), varying-length columns in a table (such as those with a VARCHAR data type) are physically stored, relative to other columns, in the order in which they appear in the CREATE TABLE statement. With RRF in effect, varying-length columns are grouped at the end of a table's rows, and that group of varying-length columns is preceded by a set of offset indicators -- one for each varying-length column -- that enable DB2 to very efficiently go right to the start of a given varying-length column. David told attendees that RRF does NOT affect what programs "see", as the logical order of a table's columns does not change with a change to RRF. RRF is a good thing with respect to varying-length-data access performance, but it may cause some issues when tablespaces are compressed (RRF rows sometimes don't compress quite as much as equivalent BRF rows), when data changes related to tables in compressed tablespaces are propagated via "log-scraping" replication tools (you just need to make sure that your replication tool can deal with the new compression dictionary that is created when a tablespace goes from BRF to RRF), and when tablespaces are operated on by the DB2 DSN1COPY utility (this utility doesn't use the SQL interface, so it is sensitive to changes in the physical order of columns even when this has no effect on the columns' logical order in a table).

Early on with DB2 9, the change from BRF to RRF was automatic with the first REORG in a DB2 9 environment of a tablespace created in a pre-9 DB2 system. Various DB2 users asked for more control over the row-format change, and IBM responded with APARs like PK85881 and PK87348. You definitely want to get to RRF at some point. With the fixes provided by these APARs, you can decide if you want BRF-to-RRF conversion to occur automatically with some utility operations (REORG and LOAD REPLACE), or if you want to explicitly request format conversion on a tablespace-by-tablespace basis. You can also determine whether or not you want tablespaces created in a DB2 9 environment to have BRF or RRF rows initially.

Time to move on from Visual Explain. David mentioned that VE is not supported in the DB2 9 environment -- it doesn't work with new DB2 9 data types (such as XML), and it can produce "indeterminate results" if a DB2 9 access plan is not possible in a DB2 Version 8 system. If you want a visual depiction of the access plan for a query accessing a DB2 9 database, you can use the free and downloadable IBM Optimization Service Center for DB2 for z/OS, or IBM Data Studio.

Bye-bye, BUILD2. David explained that partition-level online REORG in a DB2 9 system does not have a BUILD2 phase (in prior releases of DB2, this is the REORG utility phase during which row IDs in non-partitioned indexes are updated to reflect the new position of rows in a reorganized table partition). That's good, because data in a partition is essentially unavailable during the BUILD2 phase, and BUILD2 can run for quite some time if the partition holds a large number of rows. There's a catch, though: BUILD2 is eliminated because DB2 9 partition-level online REORG reorganizes non-partitioned indexes in their entirety, using shadow data sets. That means more disk space and more CPU time for partition-level REORG in a DB2 9 system. It also means that you can't run multiple online REORG jobs for different partitions of the same partitioned tablespace in parallel. You can get parallelism within one partition-level online REORG job if you're reorganizing a range of partitions (e.g., partitions 5 through 10). Note that in a DB2 10 environment, you can get this kind of intra-job parallelism for an online REORG even if the multiple partitions being reorganized are not contiguous (e.g., partitions 3, 7, 10, and 15).

DB2 for z/OS and application programming. Dave Churn, a database architect at DST Systems in Kansas City, delivered a session on application development in a DB2 context. David commented on a number of application-oriented DB2 features and functions, including these:

Fetching and inserting chunks of rows. DST has made some use of the multi-row FETCH and INSERT capabilities introduced with DB2 for z/OS Version 8. Dave said that performance benefits had been seen for programs that FETCH rows in blocks of 5-10 rows each, and for programs that INSERT rows in blocks of 20 rows each. The other side of that coin is increased programming complexity (Dave noted that with multi-row FETCH, you're "buffering in your program"). In DST's case, multi-row FETCH is not being used to a great extent, because the increased time required for programmers to write code to deal with multi-row FETCH (versus using traditional single-row FETCH functionality) is generally seen as outweighing the potential performance gain (and that gain will often not be very significant in an overall sense -- as Dave said, "How often is FETCH processing your primary performance pain point?").

Use of multi-row INSERT, on the other hand, has been found to be more advantageous in the DST environment, particularly with respect to the Company's very high-volume, time-critical, and INSERT-heavy overnight batch workload. As with multi-row FETCH, there is an increase in programming complexity associated with the use of multi-row INSERT (among other things, to-be-inserted values have to be placed in host variable arrays declared by the inserting program), but the performance and throughput benefits often made the additional coding effort worthwhile. Interestingly, others in the audience indicated that they'd seen the same pattern in their shops: multi-row INSERT was found to be of greater use than multi-row FETCH. Dave mentioned that at DST, programs using multi-row INSERT were generally doing so with the NOT ATOMIC CONTINUE ON SQLEXCEPTION option, which causes DB2 to NOT undo successful inserts of rows in a block if an error is encountered in attempting to insert one or more rows in the same block. The programs use the GET DIAGNOSTICS statement to identify any rows in a block that were not successfully inserted. These rows are written to a file for later analysis and action.

The new BINARY data type in DB2 9 can be great for some client-server applications. When DB2 for z/OS is used for the storage of data that is inserted by, and retrieved by, programs running on Linux/UNIX/Windows application servers, the BINARY data type can be a good choice: if the data will not be accessed by programs running on the mainframe, why do character conversion? Use of the BINARY data type ensures that character conversion will not even be attempted when the data is sent to or read from the DB2 database.

DB2 9 FETCH WITH CONTINUE is useful for really big LOBs. In some cases, a LOB value might be larger than what a COBOL program can handle (which is about 128 MB). The FETCH WITH CONTINUE functionality introduced with DB2 9 enables a COBOL program to retrieve a very large LOB is parts.

MERGE musings. The MERGE statement, new with DB2 9 and sometimes referred to as "UPSERT", is very handy when a set of input records is to be either inserted into a DB2 table or used to update rows already in the table, depending on whether or not an input record matches an existing table row. Dave mentioned that the matching condition (specified in the MERGE statement) will ideally be based on a unique key, so as to limit the scope of the UPDATE that occurs when a "no match" situation exists. DST likes MERGE because it improves application efficiency (it reduces the number of program calls to DB2 versus the previously-required INSERT-ELSE-UPDATE construct) and programmer productivity (same reason -- fewer SQL statements to code). Dave said that DST has used MERGE with both the ATOMIC and NOT ATOMIC CONTINUE ON SQLEXCEPTION options (when the latter is used for MERGE with a multi-record input block, GET DIAGNOSTICS is used to determine what, if any, input records were not successfully processed -- just as is done for multi-row INSERT).

SELECT FROM UPDATE/INSERT/DELETE/MERGE is great for efficiently obtaining DB2-generated or DB2-modified values. DST has used the SELECT FROM data-changing-statement syntax (introduced for INSERT with DB2 Version 8, and expanded to other data-changing statements with DB2 9) to obtain values generated by BEFORE triggers on DB2 tables (as an aside, Dave mentioned that DST has used triggers to, among other things, dynamically change a program's commit frequency). DST has also found it useful to execute SELECT FROM MERGE statements with the INCLUDE option (enabling return of values not stored in a target table) to determine whether rows in a MERGE input block were inserted or used to update the target table.

When will you use the new DB2 9 XML data type? You'll use it, Dave said, when "your clients want to exchange information with you in the form of XML documents." In other words, you're likely to use it when your company's clients make XML data exchange a requirement for doing business. DST is using DB2 9 pureXML now. You might want to get ready to use it, just in case you'll need to. Being prepared could make exploitation of the technology an easier process (and it is pretty amazing whet DB2 can do with XML documents, in terms of indexability, schema validation, and search and retrieval using XQUERY expressions embedded in SQL statements).

That's a wrap for this multi-part post. I hope that part 4 has provided you with some useful information, and I invite you to check out parts 1, 2, and 3, if you haven't already done so. The IDUG 2011 North American Conference will be held in Anaheim, California next May. I'm planning on being there, and I hope that many of you will be there as well. It's always a great source of DB2 "news you can use."

Previous Posts

Wednesday, June 9, 2010

Nuggets from DB2 by the Bay, Part 4

0 Comments:

Post a Comment