Sorry for the late response, I was out of the office last week.
We are using RAID 1. Both servers use the same RAID.
It's not that I disbelieve
you, maybe you're the exception, but the RAID configuration of SAN devices is, in my experience, the #1 most frequently mis-reported configuration detail. It's worth finding a way to independently double check.
I spent a week or so with a customer a while ago digging in to why their shiny new and very expensive Famous Vendor SAN was similarly slower than their older and much less fancy SAN. That also involved an older version of Progress running against the old SAN and 10.1X running against the new SAN and issues with a few specific queries. They were certain that there must be a problem with Progress 10.
To make a long story short:
1) The old SAN was in production and very active. The queries being tested were pretty much guaranteed to be in every cache available.
2) The new SAN was only attached to a new server running the new version of Progress with a few users testing things. Frequent restarts to change parameters and run new tests were the order of the day. Needless to say keeping data in the caches for very long was a challenge.
3) The issue that caused all the concern was specifically in regard to how long the query took to run
the first time. After the first run it was always very good. (I haven't asked but do you maybe have a similar issue in play? Does your response time improve if you run the 10.1C query again after the first attempt?)
4) Restarting the old SAN to prove that the cache was providing the data wasn't allowed. It was, after all, running production.
5) None the less we were eventually able to show that the behavior in question first of all wasn't Progress specific (we ran some simple C programs to fetch data randomly from large files) and second was very much due to differences in the SANs behaviors (that involved some very high powered test equipment and engineers from the SAN vendor).
5b) (The Famous SAN Vendor was quite unmotivated to help in any way until the vendor of the new server hardware had a truck show up with one of
their SANs on it as a "loaner" -- The loaner SAN kicked the Famous SAN Vendors SAN's butt in every imaginable way...)
6) The customer did finally convert and once the system was live the "problem" went away.
Aside from that the number of times that discrepancies turn up between what a "storage engineer" reports the configuration as and what is actually configured is non-trivial.
So humor me and double check the assertion that the SAN config is the same. (The best method would be to run some non Progress disk intensive processes...)
Here are revised Startup Parameters:
OE 10.1C:
-B 50000 # Database buffers 8kb each
Kind of small. 400MB isn't a very big buffer cache.
-directio # Use directio vs. OSYNC I/O
Probably makes no difference to anything.
-bibufs 120 # Before image buffers
-aibufs 180 # After image buffers
Probably much higher than they need to be but basically not a problem. Just a slight waste of memory.
-n 210 # Maximum number of users
# -n 110 # Maximum number of users
-S emwd # Service name (See: /etc/services)
-Mn 16 # Max. # servers per db (total of -Mpb + 1)
-Mpb 10 # Maximum number of servers per broker
-Ma 5 # Maximum number of clients per server
-spin 4000 # Latch retry count
Progress 9.1E
-B 50000 # Database buffers 8kb each
-directio # Use directio vs. OSYNC I/O
-bibufs 120 # Before image buffers
-aibufs 180 # After image buffers
-n 210 # Maximum number of users
# -n 110 # Maximum number of users
-S emwd # Service name (See: /etc/services)
-Mn 16 # Max. # servers per db (total of -Mpb + 1)
-Mpb 10 # Maximum number of servers per broker
-Ma 5 # Maximum number of clients per server
-spin 4000 # Latch retry count
Here are our client parameters:
OE 10.1C
mpro $DB -e 100 -l 200 -p login.p (this is our login program to our system)
Progress 9.1E
mpro $DB -e 100 -l 200 -p login.p (this is our login program to our system)
Why are -S, -Mn, -Ma and -Mpb being specified on the server if there are no clients using -H and -S? Do you have a bunch of ODBC clients?
-e has been obsolete since v7. Its function was replaced with -mmax which is generally set much higher, -mmax 4096 would be typical for a self-service character client.
Where are the -T files going? This could be important. The BY may be causing client side sorting and if the filesystem or LUNs that the -T files reside on are configured differently you could have very different performance. (It's not just the SAN that you need to worry about -- the filesystems could be different at the OS level as well....)
I have talked with Progress Tech Support and they guided me to the -Bt and -tmpbsize parameters. We are experimenting with different values to see if there is an impact here.
-tmpbsize changed from 1 in v9 to 4 in oe10.1c. But that wouldn't effect the code that your original post shows. It does have quite an effect on the size of temp files in the -T directory (the DBI* files if you have -t enabled) and it will also have quite an impact on RAM utilization (which might indirectly effect overall performance by causing paging -- but again that wouldn't affect just a particular query).
We are using Type 2 areas. The db block size is 8K on both systems. We changed it on the new server before the dump and load was done.
Did you put the indexes in a separate area? What is the cluster size of the area holding the tables and the indexes involved in the problem query? Likewise what is the row per block for that area? (Except for cluster size these questions apply to both v9 and oe10 -- differences in rows per block could, for instance, make a big difference.)
Our consultant told us that there were things that we could do in 9.1E that we can't do in 10.1C.
That's backwards. Hopefully it's a typo. If not you might want to consider a new consultant -- see sig line
This may be one of them. We have found a couple of other programs which perform much slower on 10.1C. The fix appears to be to add a new index which greatly improves the performance as it is now looking at a much smaller group of records than the previous index did.
Yes, that is often the fix to slow queries. But the question remains regarding why 10.1C would seem to be slower than 9.1e if everything else was, in theory anyway, the same. Did you ever get around to testing by removing the BY in v9?