How can I improve the performance of a function that reads specific records from a large file?

Date:1 November 2004
Product/Release:LANSA for iSeries
Abstract:How can I improve the performance of a function that reads specific records from a large file?
Submitted By:LANSA Services
Last Review:November 2010

Question:

A simple function to select specific records from a large file (48,000,000 records) is taking on average 1 hour and 10 minutes to execute in batch. Here is a copy of the function:

FUNCTION OPTIONS(*DIRECT) 
DEFINE FIELD(#WCOUNT) TYPE(*DEC) LENGTH(013) DECIMALS(0) 
DEFINE FIELD(#WBEGTIME) TYPE(*CHAR) LENGTH(006) 
DEFINE FIELD(#WENDTIME) TYPE(*CHAR) LENGTH(006) 
DEF_LINE NAME(#BEGTIME) FIELDS(#WBEGTIME) 
DEF_LINE NAME(#TOTAL) FIELDS(#WCOUNT) 
DEF_LINE NAME(#ENDTIME) FIELDS(#WENDTIME) 
CHANGE FIELD(#WCOUNT) TO(*ZEROS) 
CHANGE FIELD(#WBEGTIME) TO(*TIMEC) 
PRINT LINE(#BEGTIME) 
SELECT FIELDS(*ALL) FROM_FILE(GINVHID) WHERE('#IHDQSHP > 1000') 
CHANGE FIELD(#WCOUNT) TO('(#WCOUNT + 1)') 
ENDSELECT 
PRINT LINE(#TOTAL) 
CHANGE FIELD(#WENDTIME) TO(*TIMEC) 
PRINT LINE(#ENDTIME) 
ENDPRINT

How can you improve the performance of this function?

Answer:

As it is written, this algorithm will not run well on iSeries at all, regardless of whether you wrote it in Cobol or RPG or LANSA. The performance would be poor for any large record set, because it is set up to read every record in the file.

For example, let's pretend that "GINVHID" has 10 records, having "IHDQSHP" values of : { 3500, 12, 34, 1090, 1, 7, 1, 22, 45, 98 }

The original algorithm would read { 3500, 12, 34, 1090, 1, 7, 1, 22, 45, 98 }, and would print a WCOUNT of 2 { 3500, 1090 }

Instead, you should use keyed access to minimize your database I/O.

If we assume that only a small subset of records in "GINVHID" satisfy the condition "IHDQSHP > 1000", then the aim is to read only those records.

  1. Create a logical file over "GINVHID", let's call it "GINVHIDL1". Have "GINVHIDL1" keyed by "IHDQSHP" in ascending order.
     
  2. Change the RDML as follows:

CHANGE FIELD(#MINQSHP) TO(1000) SELECT 
FIELDS(*ALL) FROM_FILE(GINVHIDL1) 
WITH_KEY(#MINQTYSHP) OPTIONS(*STARTKEY)
CHANGE FIELD(#WCOUNT) TO('(#WCOUNT + 1)') ENDSELECT

The logical file "GINVHIDL1" has the records logically re-sequenced or indexed as { 1, 1, 7, 12, 22, 34, 45, 98, 1090, 3500 }

The SELECT's WITH_KEY positions the I/O pointer immediately to "1000". Without the *STARTKEY option, the SELECT command would be looking for only records with IHDQSHP = 1000, since that's the key, and there are no such records. The *STARTKEY option tells the SELECT command to start at 1000, but then keep reading through to the end of the file. Since the logical file is keyed by IHDQSHP in ascending order, all such records fit the condition to be counted/printed, and are therefore worth reading.

So, the new algorithm would only read { 1090, 3500 }, and would print a WCOUNT of 2 { 1090, 3500 }.

For 48,000,000 records, if only 0.5 percent have "IHDQSHP" values of 1000 or more, the new algorithm should run in 21 seconds, instead of 70 minutes.

It is strongly recommended that you to read more about the SELECT command in the LANSA technical guide. SELECT is one of the most important commands to fully understand.