data extract via ODBC..perfomance problems

TomBascom

Curmudgeon
It could take quite a while to run in that environment.

As I guessed everything is stuffed into the schema area. This is very, very bad for performance. For one thing it means that the SQL-92 no-index table scan cannot be used. Which is probably why the optimizer wasn't much help.

It does not, however, explain why you didn't see the same problem with version 9. All I can think of in that regards is that something else must be different. Perhaps less data or a different physical disk layout. (You'd be astonished at how many "disk upgrades" are really tremendous steps backwards in performance terms...)

You can change to type 2 areas and fix all of that without having any impact at all on the application (other than dramatically improving performance) but you will need to get beyond the idea that your vendor has control over such things.

It isn't worth discussing any more though unless you're going to be willing to make some changes.
 

gah

New Member
the db was recently upgraded to 10.1C from 9.1E a month ago....onto a new server.

I know that a dump and reload is probably required....but first off I will run dbanalys this evening and see what comes back.

If the table that I am testing with requires idxbuild then I will do that and retest the odbc extract to see if this cures the performance issue.

We do know that D&L is probably required on a regular basis...but we have limited time at a weekend to run it...IIRC the last time we did it on the old server and 9.1E it was touch and go if we would be up and running in time for 6.00am on Monday !!

Maybe it will be better with the new server...will have to test.

Type II storage...could you please explain what these are and possibly why our vendor decided not to use them...are there any downsides?

Willing to make changes...not a problem as long as we do not fall foul of our very costly maintenance agreement.!
regards gah
 

TomBascom

Curmudgeon
A type 2 storage area is one with a cluster size of 8, 64 or 512.

If an area is a type 2 area then data blocks will contain data from one, and only one table. This generally results in a very significant performance improvement over a table stored in a type 1 area. Especially if a large set of rows is being processed. (In a type 1 area a data block will contain records from many different tables.)

Because all of the blocks in a type 2 area contain data from just a single table many utilities and features can operate on that data much faster. They no longer have to examine every block in the area to find relevant data.

When a type 2 area needs additional disk space it is allocated in cluster size units. This is much more efficient than the type 1 block at a time allocation strategy.

All new and advanced features of the database engine require type 2 areas. Type 1 areas are supported for compatibility and easy migration but you should move to type 2 ASAP.

The schema area is always a type 1 area. Therefore no data should ever be stored in the schema area. Nor should any indexes be there.
 

TomBascom

Curmudgeon
Your vendor probably has little or no awareness of these things. They generally don't.

There is also a certain amount of thinking and analysis that needs to go into a good storage area design. The lazy and ineffective way to do it is by table function. You put all order related data in an area, all inventory data in another and so on. That approach totally misses the point from a performance POV. Several famous ERP vendors do it this way.

What you really want to do is to assign data to storage areas based on common characteristics like average row size. You also want to isolate heavily used tables even if they are very small. So having some basic baseline data regarding system usage patterns is quite helpful.

Thinking ahead you might also isolate historical data and read-only data to be positioned to take advantage of features coming in future releases like table partitioning and secondary buffer pools.
 

TomBascom

Curmudgeon
The use of storage areas is utterly transparent to the application. It is a logical concept that the application is totally insulated from. R-code contains no area information at all so you can change them to your hearts content without any impact.

From an administrative POV you will need to pay attention to updates as you get them from your vendor. New tables and indexes won't be aware of the changes that you have made so you will need to move them to an appropriate area as they come in. If the vendor suddenly discovers storage areas you might also need to override their ideas regarding where to but things. (But I wouldn't hold my breath waiting for that to happen.) If the vendor has a convoluted update process that seeks to force you to use their brain-dead storage area scheme (at least one does) you might have to go to some extra effort to defeat that too. (Your unnamed vendor obviously does not do this, at least not on purpose, since everything is in the schema area.)
 

TomBascom

Curmudgeon
IMHO idxbuild is unlikely to make any sort of difference. But feel free to try it and if it does help great!
 

gah

New Member
Tom

many thanks for your responses...certainly food for thought here.

Will be having a discussion with the vendor ASAP :)
 

TomBascom

Curmudgeon
BTW -- I happen to have a certain amount of expertise regarding dumping and loading. There's no way that a 30GB db should be taking so long that you need to worry about getting it done on a weekend. Even with Windows, RAID5 and a non-existent storage area configuration you should be able to get it done in well under 48 hours.

My guess is that the previous D&L wasn't very sophisticated. For an overview of a better way: Highly Parallel Dump & Load Of course, I'd also be happy to come in and execute the process on a professional basis ;)
 

gah

New Member
These are the results for dbanalys.....as I am sure you will appreciate they dont mean a lot to me.
I have only included the relevant results for the table that I am testing on.

any thoughts on these results please?

OpenEdge Release 10.1C03 as of Sun Feb 1 00:37:52 EST 2009
PROGRESS Database Analysis
Database: D:\tst_data
Blocksize: 4096
RecSpaceSearchDepth: 5
Options: chanalys ixanalys tabanalys
Date: Thu Sep 24 18:30:37 2009

CHAIN ANALYSIS FOR AREA "Control Area" : 1
----------------------------------------------------------
FREE CHAIN ANALYSIS
-------------------
0 block(s) found in the free chain.

RM CHAIN ANALYSIS
-----------------
1 block(s) found in the RM chain.

INDEX DELETE CHAIN ANALYSIS
-----------------
0 block(s) found in the Index Delete chain.

AREA "Control Area" : 1 BLOCK ANALYSIS
-------------------------------------------------
7 block(s) found in the area.
Current high water mark: 5
0 free block(s) found in the area
1 record block(s) found in the area
2 index block(s) found in the area
2 empty block(s) found in the area
1 object block(s) found in the area
CHAIN ANALYSIS FOR AREA "Schema Area" : 6
----------------------------------------------------------
FREE CHAIN ANALYSIS
-------------------
0 block(s) found in the free chain.

RM CHAIN ANALYSIS
-----------------
758850 block(s) found in the RM chain.

INDEX DELETE CHAIN ANALYSIS
-----------------
0 block(s) found in the Index Delete chain.

AREA "Schema Area" : 6 BLOCK ANALYSIS
-------------------------------------------------
7168001 block(s) found in the area.
Current high water mark: 6562690
0 free block(s) found in the area
4565418 record block(s) found in the area
1997269 index block(s) found in the area
605311 empty block(s) found in the area
1 object block(s) found in the area
BLOCK ANALYSIS
--------------
1 master block(s) found in the database.
0 area block(s) found in the database.
1 control block(s) found in the database.
2 object block(s) found in the database.



RECORD BLOCK SUMMARY

RECORD BLOCK SUMMARY FOR AREA "Control Area" : 1
-------------------------------------------------------
-Record Size (B)- ---Fragments--- Scatter
Table Records Size Min Max Mean Count Factor Factor
_Area 3 145.0B 45 54 48 3 1.0 1.0
_AreaExtent 19 1.1K 59 63 61 19 1.0 1.0
------------------------------------------------------------
Subtotals: 22 1.3K 45 63 60 22 1.0 1.0


RECORD BLOCK SUMMARY FOR AREA "Schema Area" : 6
-------------------------------------------------------
-Record Size (B)- ---Fragments--- Scatter
Table Records Size Min Max Mean Count Factor Factor
PUB.MYTABLE
2816635 308.8M 98 150 114 2816923 1.0 4.1


4565419 RM block(s) found in the database.
79.67% of the RM block space is used.



INDEX BLOCK SUMMARY

INDEX BLOCK SUMMARY FOR AREA "Control Area" : 1
-------------------------------------------------------
Table Index Fields Levels Blocks Size % Util Factor
_Area
_Area-Number 1027 1 1 1 34.0B 0.8 1.0
_AreaExtent
_AreaExtent-Area 1028 2 1 1 184.0B 4.5 1.0

INDEX BLOCK SUMMARY FOR AREA "Schema Area" : 6
-------------------------------------------------------
Table Index Fields Levels Blocks Size % Util Factor
PUB.MYTABLE
IND_1 362 1 3 1012 2.6M 67.0 1.7
IND_2 363 1 3 980 2.4M 62.8 1.7
IND_3 364 1 3 1134 2.7M 62.5 1.7
IND_4 365 1 3 858 2.7M 82.3 1.4
IND_5 366 2 3 2510 6.7M 69.3 1.6
IND_6 367 1 3 979 2.4M 62.8 1.7
IND_7 368 1 3 1214 2.9M 60.9 1.8
IND_8 361 1 3 6274 24.3M 99.8 1.0
IND_9 369 1 3 8470 24.5M 74.6 1.5

DATABASE SUMMARY

SUMMARY FOR AREA "Control Area" : 1
-------------------------------------------------------
Records Indexes Combined
NAME Size Tot % Size Tot % Size Tot %
_Area 145.0B 0.0 34.0B 0.0 179.0B 0.0
_AreaExtent 1.1K 0.0 184.0B 0.0 1.3K 0.0

SUMMARY FOR AREA "Schema Area" : 6
-------------------------------------------------------
Records Indexes Combined
NAME Size Tot % Size Tot % Size Tot %
PUB.MYTABLE
308.8M 1.6 71.3M 0.4 380.1M 2.0

0 free block(s) found in the database.
0 index table block(s) found in the database.
1 sequence block(s) found in the database.
605313 empty block(s) found in the database.
7168008 total blocks found in the database.
database analysis complete Thu Sep 24 19:04:38 2009
 

TomBascom

Curmudgeon
These are the results for dbanalys.....as I am sure you will appreciate they dont mean a lot to me.

Most of it is noise.

I have only included the relevant results for the table that I am testing on.

any thoughts on these results please?

This stuff is a lot more readble with CODE tags wrapped around it.
Code:
OpenEdge Release 10.1C03 as of Sun Feb 1 00:37:52 EST 2009
PROGRESS Database Analysis
Database: D:\tst_data
Blocksize: 4096
 
RECORD BLOCK SUMMARY FOR AREA "Schema Area" : 6
-------------------------------------------------------
-Record Size (B)- ---Fragments--- Scatter
Table Records Size Min Max Mean Count Factor Factor
PUB.MYTABLE 
2816635 308.8M 98 150 114 2816923 1.0 4.1
 
 
INDEX BLOCK SUMMARY FOR AREA "Schema Area" : 6
-------------------------------------------------------
Table Index Fields Levels Blocks Size % Util Factor
PUB.MYTABLE
IND_1 362 1 3 1012 2.6M 67.0 1.7
IND_2 363 1 3 980 2.4M 62.8 1.7
IND_3 364 1 3 1134 2.7M 62.5 1.7
IND_4 365 1 3 858 2.7M 82.3 1.4
IND_5 366 2 3 2510 6.7M 69.3 1.6
IND_6 367 1 3 979 2.4M 62.8 1.7
IND_7 368 1 3 1214 2.9M 60.9 1.8
IND_8 361 1 3 6274 24.3M 99.8 1.0
IND_9 369 1 3 8470 24.5M 74.6 1.5

SUMMARY FOR AREA "Schema Area" : 6
-------------------------------------------------------
Records Indexes Combined
NAME Size Tot % Size Tot % Size Tot %
PUB.MYTABLE
308.8M 1.6 71.3M 0.4 380.1M 2.0

The standard recommendation is to use a 4k db block with windows. You're doing that. Under some conditions it might be better to change to 8k blocks -- but I would try other things first.

Utilization of indexes is a bit low. You might benefit from an idxcompact. Coincidentally your RPB is actually reasonably well matched to the 4k block size - you wouldn't be wasting lots of space if this were a type 2 area. Your "mytable" records are not fragmented but they are scattered. You've got a lot of indexes (which we don't know the definitions of) so it is very likely that you suffer from "logical scatter" but dbanalys doesn't analyze that.

Dump & load will improve index utilization and scatter.

Dump & load combined with a redesign of the storage areas would be a big help. But you cannot see that from anything here.
 

gah

New Member
Update...

looking more closely at where the extract slows down...and is noticeable for all large tables...it would appear that this happens with records written after our previous dump and load.

eg Table slows down at 1.28M recs, Table 2 at 3.5M recs etc...precisely the number of records that were dumped and loaded previously

We have eventually found a time slot to perform another dump and load on the TEST system (no change to storage types).

The db is approx 30Gb and this took 15 hrs to dump and 5 hrs to load with in line index build...we were expecting the load to take longer than the dump ??

analysis of the dump logs demonstrates that the dump also slows down at the same no records as the ODBC rip...eg at the point where the last D&L finished. For example in a table of 2.8M recs..1.28M recs in 19 secs..the other 1.52M recs took another 11 minutes to dump.
This pattern was repeated on all tables.

Is this to be expected?

Incidentally...since the D&L...the ODBC rip has performed well and at a constant speed for the whole table.

Whilst this has solved our immediate problems with ODBC...we are now concerned that we will face the same issues in future months as the dB continues to grow.

Any comments / suggestions?
Many thanks
 

TomBascom

Curmudgeon
1) You were expecting the load to take longer than the dump.

That is a common, but mistaken, expectation. Particularly with older legacy databases that use type 1 storage areas.

2) The slowness seems to start at the point where new data gets entered.

That strongly suggests that the new data is being added in a highly scattered manner and that you are using a poorly structured database. The data immediately post dump & load is probably very sequential and tightly packed. Especially if you do a single threaded load and use type 1 areas. Once normal processing starts new records will be added mixed in with records from many other tables and those sequential characteristics are lost.

Moving to properly configured type 2 areas will keep your data organized and greatly reduce the need to dump & load.
 

gah

New Member
Thanks for clearing up my misconception re load times.

re type 2 areas ...what improvement in query response time could we expect from moving from type 1 to type 2 in the main 4GL Progress application?
Vendor indicates maybe 10%.....but they have also said previously that they dont know if we will get any benefit at all...not particularly helpful!

we have to balance any improvement in type 2 against increase in workload for us in managing the monthly patch releases and continually growing number of tables and indexes created by vendor.

at the moment the vendor says they do not support type 2 by default....eg more support cost to us if we do wish to move to type 2.

sad times me thinks.

thanks for the continued interest.
 

gah

New Member
Another issue with the D&L was that after we performed it we still had a scatter factor over 1.0....see results below as an example.

We had followed the vendors supplied script which loaded tables in alphabetical name order....vendor now tells us that we need to reorder the load script to load smallest tables first....hey ho.

will be testing again this weekend hopefully....but before we do...any suggestions

Code:
RECORD BLOCK SUMMARY FOR AREA "Schema Area" : 6
-------------------------------------------------------
                                       -Record Size (B)- ---Fragments--- Scatter
Table                Records    Size   Min   Max  Mean      Count Factor  Factor
PRE DUMP AND LOAD
PUB.TABLE            2816635  308.8M    98   150   114    2816923    1.0     4.1
POST DUMP AND LOAD
PUB.TABLE            2816635  308.8M    98   150   114    2816635    1.0     1.7

apologies if I should have posted this as a new thread.
 

TomBascom

Curmudgeon
The first thing you should do when you need to get out of a hole is to stop digging. Your vendor really needs to put the shovel down and get some education.

"10%" is just a "I don't really know but I have to say something" response.

Properly implemented type 2 areas are absolutely huge for performance. There is zero impact on your vendor or the functionality of their application -- you could do it yourself and they would never know (except perhaps that you stop calling to complain about performance).

How "huge" is "huge"? It depends. If you already have a well tuned system and you have isolated troublesome tables and indexes in dedicated type1 areas (this is a way to fake type 2 areas in v9) then it probably isn't really going to be all that much -- in spite of my saying that it is huge. But you do not have that. You have a fairly horridly tuned system with a dreadful storage arrangement. Of course you've also just done a dump & load so, for the moment, things are "ok". But it will not last (as you have discovered) and doing dump and loads to fix performance problems gets tiresome after a while.
 

TomBascom

Curmudgeon
I didn't really answer "how huge" "huge" could be if one were to have a fairly fouled up configuration and fixed it.

10x is well within range. IOW if you have a query that now takes 10 minutes it could be reduced to 1 minute. Or less. Possibly much less. I've seen things that took hours cut down to seconds. I cannot, however, guarantee that that is the case in your situation. I wouldn't rule it out though.

Database performance tuning is something that application vendors are very bad at.

Some of them are infamous for stonewalling customers. One of their favorite tactics is to threaten "no support" if you do not hire their people or if you deviate in any way from their "standard". This is obviously crap. You are the customer and you have the power of the purse. You may need to remind someone in your organization which way the power flows but, in the end, your organization is in charge. Not the other way around.
 
Top