10.1.C Index Performance Issue

John Sprague · Jul 29, 2009

We are converting from Progress 9.1.E to Open Edge 10.1.C. We have a program which takes about 47 minutes to run in 9.1.E but almost 2.5 hours in 10.1.C. I have determined that the BY clauses in the following for each loop is the cause of the problem.

for each account no-lock,
first customer where customer.customer# = account.customer# no-lock
by customer.last-name
by customer.first-name:

In 10.1.C, removing the two BY clauses speeds up the program so it runs in 24 minutes or less. I have compiled both programs in 9.1.E and 10.1.C and the XREF listings for this specific code are identical. The customer WHERE clause in both use the same index and the two BY clauses both specify sort-access. Indexes exist for both the customer WHERE clause and the two BY clauses.

It appears that 9.1.E uses the index for the customer WHERE clause and the index for the two BY clauses. It appears that 10.1.C is using an index for the customer WHERE clause but not using an index for the two BY clauses.

The 10.1.C ABL Triggers & Index manual states on page 12, case 1 that "If there is a BY field clause and the field is indexed, ABL uses the index to sort returned records as long as there is no index on the WHERE clause. If the field is not index, ABL creates a temporary sort table and sorts the records at run time."

Based on this, it appears that 10.1.C will not use an index on the BY clause if the WHERE clause uses an index.

Is this correct?

Was there a change between 9.1.E and 10.1.C to cause the processing time to skyrocket?

Casper · Jul 30, 2009

You have installed the latest service pack for 10.1C?
Are you using the same database and database settings for both tests? (settings of -B might impact performance a lot), same aplies for disk subsystem and wether or not the program is run remote or local as as self service client.
What OS are you on?
What is the system doing during the test with 10.1C? What are the bottlenecks, cpu, diskIO, memory? And how does this compare to the test with 9.1E?
If your temp files used for the sort remain on slow disks or they reside in memory can make a major difference in performance on large sets of data.

Differences this big in performance most time have to do with comparing different data or comparing different test situations or a bug in Progress

awink

Casper.

TomBascom · Jul 30, 2009

I'm not aware of any specific change either deliberate (a feature) or accidental (a bug) that would cause this. It would certainly be worth running it past tech support though.

Questions:

1) Do you still have both environments available for testing?

2) What happens if you change the BY clause in v9? Doe it also get faster there?

3) How was the db upgraded? conv910? or dump & load?

4) Is the query using db tables or temp-tables?

5) As Casper says the environment can also be contributing a lot to this. Startup parameters, disk assignments and so forth have a lot of influence on longer queries. What else has changed in the upgrade? (It is very unlikely that the answer is "nothing".)

John Sprague · Jul 31, 2009

We are making multiple changes with this upgrade. We are replacing our current box, upgrading the OS and the database.

Our current box is a SUN SPARC running Solaris 8 with Progress 9.1E.

Our new box is a newer SUN SPARC running Solaris 10 with Open Edge 10.1C. The latest service pack for 10.1C has been applied. The database was installed on the new box and the data loaded via a dump & load.

Both boxes use a SAN system for storage (same SAN). Our application is character based. All users log in to the SUN box. Here are our startup parameters for both 9.1E and 10.1C

9.1E Startup Parameters (from PROMON)
-B 50000
-bibufs 120
-aibufs 180
-G 60
-i Not Enabled
-Bpmax 64
-L 8192
-Ma 5
-Mf 3
-Mn 16
-n 211
-r -R Raw

10.1C Startup Parameters (from PROMON)
-B 50000
-bibufs 120
-aibufs 180
-Mxs 105
-G 0
-i Not enabled
-Bpmax 64
-L 8192
-Ma 5
-maxxids 0
-Mf 3
-Mn 16
-n 211
-r -R Raw
-nolru No

We have a consultant doing the actual work. He has worked with us before and has done multiple db upgrades for us. He wants to make sure the issue is resolved before we go live so we don't have any surprises. He is not sure what the issue is, a new bug or a new strange behavior in 10.1C.

We did have problems earlier with the performance of 10.1C but have fixed the issues. Changes were made to the server settings and a dump & load was done. This issue appears to be the only performance issue remaining.

I will test the program with the BY clauses removed on the 9.1E box to see what the performance is. I have to wait until we get past month end.

The query uses the database tables and not temp tables. I do have a work around for the issue but would prefer not to do this. The work around is to remove the BY clauses from the query and write the data to a temp table. The temp table is indexed and the index is the two BY clause fields. This fixes the performance issue. The issue has appeared in a different program and involves different tables and indexes.

The problem appears to be that OE 10.1C is not using the second index to do the sort and is sorting on the fly. It appears that I can have the where clause be indexed but not the BY clauses or have the BY clauses be indexed but not the WHERE clause.

Thank you for all assistance.

TomBascom · Jul 31, 2009

Solaris 10 is a fairly weird OS. They changed all sorts of things about resource allocation and put in some whacked out defaults for a lot of the new stuff. (I'm short on details at the moment -- I don't see much Solaris these days.)

Ok, the SAN is the same but are the filesystems and RAID groups the same configurations? Or is, perhaps, the old stuff RAID 10 and the new stuff RAID 5?

You do have some interesting Progress parameter differences but nothing that would explain the behavior that you describe.

On the other hand... I don't see -spin in your list. Why is that? Is this a "workgroup" database? (With -n 211 that seems unlikely -- which begs the question of what is -spin?)

Is -r being used or not? Your description is unclear.

There are a lot of other missing parameters that might be important. For the most part these look like db startup parameters. What about client session parameters? Those have a very large impact on query performance -- especially if client side sorting occurs (perhaps, for instance, your v9 sessions have -TB 31 -TM 32 and you OE 10 sessions do not. Or maybe -T points to a RAID 10 for for v9 and a RAID5 for OE10...)

Or maybe the v9 system is self-service while the OE10 system is client- server (-H and -S are used on the session connect parameters...).

Are you certain that the indexes on the tables in question are the same? Perhaps something got mangled in the dump & load and either the indexes didn't get created or they didn't get activated or they are defined slightly differently? Or maybe they ended up in the wrong storage area?

What is the db block size? Did it change in the conversion?

How about the bi cluster size?

How about your storage areas? Did you change anything there? Are you using type 2 areas?

John Sprague · Aug 11, 2009

Sorry for the late response, I was out of the office last week.

We are using RAID 1. Both servers use the same RAID.

Here are revised Startup Parameters:
OE 10.1C:
-B 50000 # Database buffers 8kb each
-directio # Use directio vs. OSYNC I/O
-bibufs 120 # Before image buffers
-aibufs 180 # After image buffers
-n 210 # Maximum number of users
# -n 110 # Maximum number of users
-S emwd # Service name (See: /etc/services)
-Mn 16 # Max. # servers per db (total of -Mpb + 1)
-Mpb 10 # Maximum number of servers per broker
-Ma 5 # Maximum number of clients per server
-spin 4000 # Latch retry count

Progress 9.1E
-B 50000 # Database buffers 8kb each
-directio # Use directio vs. OSYNC I/O
-bibufs 120 # Before image buffers
-aibufs 180 # After image buffers
-n 210 # Maximum number of users
# -n 110 # Maximum number of users
-S emwd # Service name (See: /etc/services)
-Mn 16 # Max. # servers per db (total of -Mpb + 1)
-Mpb 10 # Maximum number of servers per broker
-Ma 5 # Maximum number of clients per server
-spin 4000 # Latch retry count

Here are our client parameters:

OE 10.1C
mpro $DB -e 100 -l 200 -p login.p (this is our login program to our system)

Progress 9.1E
mpro $DB -e 100 -l 200 -p login.p this is our login program to our system)

I have talked with Progress Tech Support and they guided me to the -Bt and -tmpbsize parameters. We are experimenting with different values to see if there is an impact here.

We are using Type 2 areas. The db block size is 8K on both systems. We changed it on the new server before the dump and load was done.

Our consultant told us that there were things that we could do in 9.1E that we can't do in 10.1C. This may be one of them. We have found a couple of other programs which perform much slower on 10.1C. The fix appears to be to add a new index which greatly improves the performance as it is now looking at a much smaller group of records than the previous index did.

TomBascom · Aug 11, 2009

John Sprague said:
Sorry for the late response, I was out of the office last week.

We are using RAID 1. Both servers use the same RAID.

It's not that I disbelieve you, maybe you're the exception, but the RAID configuration of SAN devices is, in my experience, the #1 most frequently mis-reported configuration detail. It's worth finding a way to independently double check.

I spent a week or so with a customer a while ago digging in to why their shiny new and very expensive Famous Vendor SAN was similarly slower than their older and much less fancy SAN. That also involved an older version of Progress running against the old SAN and 10.1X running against the new SAN and issues with a few specific queries. They were certain that there must be a problem with Progress 10.

To make a long story short:

1) The old SAN was in production and very active. The queries being tested were pretty much guaranteed to be in every cache available.

2) The new SAN was only attached to a new server running the new version of Progress with a few users testing things. Frequent restarts to change parameters and run new tests were the order of the day. Needless to say keeping data in the caches for very long was a challenge.

3) The issue that caused all the concern was specifically in regard to how long the query took to run the first time. After the first run it was always very good. (I haven't asked but do you maybe have a similar issue in play? Does your response time improve if you run the 10.1C query again after the first attempt?)

4) Restarting the old SAN to prove that the cache was providing the data wasn't allowed. It was, after all, running production.

5) None the less we were eventually able to show that the behavior in question first of all wasn't Progress specific (we ran some simple C programs to fetch data randomly from large files) and second was very much due to differences in the SANs behaviors (that involved some very high powered test equipment and engineers from the SAN vendor).

5b) (The Famous SAN Vendor was quite unmotivated to help in any way until the vendor of the new server hardware had a truck show up with one of their SANs on it as a "loaner" -- The loaner SAN kicked the Famous SAN Vendors SAN's butt in every imaginable way...)

6) The customer did finally convert and once the system was live the "problem" went away.

Aside from that the number of times that discrepancies turn up between what a "storage engineer" reports the configuration as and what is actually configured is non-trivial.

So humor me and double check the assertion that the SAN config is the same. (The best method would be to run some non Progress disk intensive processes...)

Here are revised Startup Parameters:
OE 10.1C:
-B 50000 # Database buffers 8kb each

Kind of small. 400MB isn't a very big buffer cache.

-directio # Use directio vs. OSYNC I/O

Probably makes no difference to anything.

-bibufs 120 # Before image buffers
-aibufs 180 # After image buffers

Probably much higher than they need to be but basically not a problem. Just a slight waste of memory.

-n 210 # Maximum number of users
# -n 110 # Maximum number of users
-S emwd # Service name (See: /etc/services)
-Mn 16 # Max. # servers per db (total of -Mpb + 1)
-Mpb 10 # Maximum number of servers per broker
-Ma 5 # Maximum number of clients per server
-spin 4000 # Latch retry count

Progress 9.1E
-B 50000 # Database buffers 8kb each
-directio # Use directio vs. OSYNC I/O
-bibufs 120 # Before image buffers
-aibufs 180 # After image buffers
-n 210 # Maximum number of users
# -n 110 # Maximum number of users
-S emwd # Service name (See: /etc/services)
-Mn 16 # Max. # servers per db (total of -Mpb + 1)
-Mpb 10 # Maximum number of servers per broker
-Ma 5 # Maximum number of clients per server
-spin 4000 # Latch retry count

Here are our client parameters:

OE 10.1C
mpro $DB -e 100 -l 200 -p login.p (this is our login program to our system)

Progress 9.1E
mpro $DB -e 100 -l 200 -p login.p (this is our login program to our system)

Why are -S, -Mn, -Ma and -Mpb being specified on the server if there are no clients using -H and -S? Do you have a bunch of ODBC clients?

-e has been obsolete since v7. Its function was replaced with -mmax which is generally set much higher, -mmax 4096 would be typical for a self-service character client.

Where are the -T files going? This could be important. The BY may be causing client side sorting and if the filesystem or LUNs that the -T files reside on are configured differently you could have very different performance. (It's not just the SAN that you need to worry about -- the filesystems could be different at the OS level as well....)

I have talked with Progress Tech Support and they guided me to the -Bt and -tmpbsize parameters. We are experimenting with different values to see if there is an impact here.

-tmpbsize changed from 1 in v9 to 4 in oe10.1c. But that wouldn't effect the code that your original post shows. It does have quite an effect on the size of temp files in the -T directory (the DBI* files if you have -t enabled) and it will also have quite an impact on RAM utilization (which might indirectly effect overall performance by causing paging -- but again that wouldn't affect just a particular query).

We are using Type 2 areas. The db block size is 8K on both systems. We changed it on the new server before the dump and load was done.

Did you put the indexes in a separate area? What is the cluster size of the area holding the tables and the indexes involved in the problem query? Likewise what is the row per block for that area? (Except for cluster size these questions apply to both v9 and oe10 -- differences in rows per block could, for instance, make a big difference.)

Our consultant told us that there were things that we could do in 9.1E that we can't do in 10.1C.

That's backwards. Hopefully it's a typo. If not you might want to consider a new consultant -- see sig line

This may be one of them. We have found a couple of other programs which perform much slower on 10.1C. The fix appears to be to add a new index which greatly improves the performance as it is now looking at a much smaller group of records than the previous index did.

Yes, that is often the fix to slow queries. But the question remains regarding why 10.1C would seem to be slower than 9.1e if everything else was, in theory anyway, the same. Did you ever get around to testing by removing the BY in v9?

10.1.C Index Performance Issue

John Sprague

New Member

Casper

ProgressTalk.com Moderator

TomBascom

Curmudgeon

John Sprague

New Member

TomBascom

Curmudgeon

John Sprague

New Member

TomBascom

Curmudgeon