Scatter factors / performance after a D/L.

ron

Member
Hi,

I'm working with a system that is widely dispersed - over 200 small servers over a large geographical area. Most are UnixWare / Progress 9.1D - but migration is under-way to Linux / Progress 10.2B. (All licences at remote sites are Workgroup; AI is in use everywhere.)

During the past year we've introduced an automated (well, semi-automated, anyway) process where the remote DBs get dumped/reloaded about every nine months. Before we did this users were very unhappy about performance - but now very much happier.

But there is something rather odd. At many sites the users start complaining about performance again only four or five months after a D/L - and if I do another D/L they are suddenly "happy" again. But when I look at a tabanalys report the scatter factors max-out at something 1.2, which is "puny".

I'm getting some pressure to do the D/L cycle more frequently than every (about) nine months - but I am very reluctant. I am 99.99% sure that the problem here is somewhere in the application - but I don't have access to the code.

What I'd like to do is to be able to produce details of scatter factors for some of the large tables by "section" (say each 10% of the table - rather than over the entire table (as the tabanalys report provides). I think it is possible that some high-hit tables may get very fragmented "at the end".

a. Any suggestions as to the kind(s) of problems I might look for?
b. Can anyone tell me the algorithm to calculate scatter factors from a stream of recids?

Thanks,
Ron.
 

RealHeavyDude

Well-Known Member
OTOH I would check in what kind of storage areas (type I or type II) the offending tables are in. Most database I've seen have been migrated from V6 to OpenEdge 10 using the convert utility over and over again, and, thereby ending up with the same junk (non-optimized) data structure in the new release as it was in the old release. If type I storage areas are used - then you will most certainly get rid of the need to D/L in migrating to type II storage areas, and, by doing that, optimizing the storage area design. There are numerous posts in this forum as to what the different storage area types are and what principles you should follow for an optimized storage area design.

BTW - as the storage areas re-design will require you to D/L this would be the last one ...

Heavy Regards, RealHeavyDude.
 

ron

Member
Yes - quite aware of the type II storage areas - and the advantages of using them. But rolling-out the upgrades to Linux/Progress 10.2B at very remote sites is a pretty slow process. Most sites will have to "suffer" remaining with Progress 9.1D (and type I storage areas) for up to two years, so I'm looking for a reason for their performance being degraded so quickly after a D/L.
 

RealHeavyDude

Well-Known Member
9.1D ( from today's point of view, of course ) sucks - it is stone age software ...

But, there is still an option to separate the "offending tables" into their own type I storage areas and thus minimizing the scatter problem in V9. This problem is a "feature" of the Progress database ever since. Apart from using storage area type II or moving the concerned tables into their own storage area there is not much that you can do other than to disallow the users to create and delete records ;-)

In general there might be potential for other performance tunings - which won't help with the scatter.

Heavy Regards, RealHeavyDude.
 

ron

Member
Yep - I'm with you. But the trouble is I don't know what are the "offending tables". That's why I'd like to be able to get scatter factors of sections of tables. I think that whereas tabanalys says every table is "great" - some big - frequently-accessed - tables may be very disorganised in sections. That's why I wanted to know how the scatter factors are calculated. I'd imagine it's not especially complex.
 

cj_brandt

Active Member
To know which tables may be causing performance issues, I would enable table and index stats. This will show which tables have the highest read counts. Stick those tables in their own area like RHD suggested.

Most apps do 80% of their activity with the last 3 months of data. After 5 months the majority of the data the app is processing hasn't benefitted from your last D&L.

Perhaps what you are trying to determine about scatter - Pick a table with large # of reads and using the index with the most activity, read a record from the last week. Then start reading records after that and determine what database block they are in. If a lot of your records are coming from the same db block - you will have little disk io and good performance. If your records are all stored in different db blocks - you will have a lot of disk io and poor performance.
 

ron

Member
I already collect LOTS of stats for the in-house servers - but not at the remote sites. But I like your idea of checking for contiguous records in the same block. I can do that. Thanks for the idea!
 
Top