DB size blow-out after converting to 9.1D

ron

Member
Our main db is 110GB. It was dumped/reloaded last December and last April under Progress 8.3E. Each time the DB shrank by about 8 GB ... and regained this 8 GB within three weeks as indexes lost their high degree of compression. (All normal.)

On 18th May we upgraded to 9.1D05 (last week to 9.1D06) - splitting the whole DB into eight Areas. Three Areas contain indexes only ... for the largest tables we have.

Since converting to 9.1D the DB is growing at an alarming rate; it is now 13.4 GB larger than when converted. In the same period after the prior two D&L's it grew 7.6 and 6.1 GB. ALL of this excessive growth is within the Areas that contain indexes.

Has anyone else seen a similar effect?

Does anyone know what causes this?

Regards,
Ron.
 

MurrayH

Member
Don't forget that each record in V9 requires an extra 12 bytes (??) over V8

ron said:
Our main db is 110GB. It was dumped/reloaded last December and last April under Progress 8.3E. Each time the DB shrank by about 8 GB ... and regained this 8 GB within three weeks as indexes lost their high degree of compression. (All normal.)

On 18th May we upgraded to 9.1D05 (last week to 9.1D06) - splitting the whole DB into eight Areas. Three Areas contain indexes only ... for the largest tables we have.

Since converting to 9.1D the DB is growing at an alarming rate; it is now 13.4 GB larger than when converted. In the same period after the prior two D&L's it grew 7.6 and 6.1 GB. ALL of this excessive growth is within the Areas that contain indexes.

Has anyone else seen a similar effect?

Does anyone know what causes this?

Regards,
Ron.
 

ron

Member
MurrayH said:
Don't forget that each record in V9 requires an extra 12 bytes (??) over V8

Not forgotten. We presume that the initial 8GB increase in the total DB size at the time of upgrading to V9 was because of the increased size of the record headers. That was expected - and isn't a problem.

It's the rampant growth of the Areas that only contain indexes that is concerning me. :complain:
 

ron

Member
MurrayH said:
Is the application doing major index changes?

The application is a billing system (water utility). Yes, indexes get added-to at a rather high rate. That's why, when we do a Dump/Reload/Re-index ... for the next 2 to 3 weeks the indexes always grow very quickly as nodes get split.

The problem is that this growth after the D&L is much greater than we used to have under Progress 8 (nearly double) ... and the rate is not starting to taper-off (as is used to with Progress 8).
 

MurrayH

Member
Index only areas should be 1 (ONE) record per block - is that set?

ron said:
The application is a billing system (water utility). Yes, indexes get added-to at a rather high rate. That's why, when we do a Dump/Reload/Re-index ... for the next 2 to 3 weeks the indexes always grow very quickly as nodes get split.

The problem is that this growth after the D&L is much greater than we used to have under Progress 8 (nearly double) ... and the rate is not starting to taper-off (as is used to with Progress 8).
 

ron

Member
MurrayH said:
Index only areas should be 1 (ONE) record per block - is that set?

No it isn't, Murray, 128 records/block is set for every Area.

I don't understand why records/block has any meaning at all to Progress in the case of an Area that only contains indexes.

If we should set "1" record/block for index-only Areas ... I would unquestionably consider that to be a bug!

Anyone else have experiences/opinions in this realm?
 

Corwin

New Member
As Murray said, use 1!

Why: Using a value of one says "use as much space as you want" when applied to indexes only.

At a 8,192 byte /128 recs per block you are going with a fixed size of 64 bytes per rec. If you run a idxanal and find a mean size of 32 bytes (I am deliberately not not mentioning those 12 bytes that Murray mentioned) then that means half the space is unused. If the mean size if 65 bytes that means two 32 byte records are used fairly frequrently, and again you are only using about half the space.

So... just let Progress take care of the index space utilization. It does set aside some space for expansion (I hope this is true, I know it is for data space) so it should handle things if your keys change in an existing block (up to a point, watch the scatter for that).

BTW: If you do use a value of 1, make sure you do not store anything but indexes in that space, otherwise you get a lot of wasted space.

Good luck!
 
OK - Having spent some tome reading Corwin's post I have the same issue that Ron has.

When Progress allocated a block to an index (remember - a block gets allocated to a particular index - never shared), it add the block reference into the B-Tree structure at the appropriate point and then starts putting index keys in that block.

The number of records per block is (or should be) totally irrelevant since Index information is seperate and handled differently from Data.

This has been the case since v6 if not v4 - but I didn't understand block structures back then - and as far as I am aware has remained the same ever since.

By Corwin's maths we should all be setting the number of records per block on the index areas to 512 or more to cram as many records in as possible.

It is worth noting at this point that setting a number of records per block = 128 does NOT create 128 structures each fixed at 64 bytes. It creates a header which contains a 5 bit number which points to the position within the block, but only for Data blocks. These records could very well be 3 records of 2k and 125 records of 5 bytes.

A quick scan of Progress KBASE hase returned an interesting article - 21171: What is the Proper Records Per Block Value For Index Storage Areas

This basically says that unless you have a huge table (more than 1 billion records) with huge keys that do not compress you will see almost no benefit from this.

Sorry all - I trust Gus more than most!
 

ron

Member
MurrayH said:
I rang Progress Tech Support and they said "ALWAYS set it to '1' for index only areas".

Guys ... thank you all very much for sharing your knowledge and thoughts! It has certainly given us some threads to follow-up.

If we DO find that our problem has been caused by not setting the records per block to 1 ... my high opinion of Progress will be tarnished.

Please note: the records/block DOES NOT appear to have had ANY detrimental impact on the initial building of the indexes ... they all started-off at "the right size". It is only the behaviour of the btree index maintenance that seems to be aberrant. :drink:
 
MurrayH said:
I rang Progress Tech Support and they said "ALWAYS set it to '1' for index only areas".

Yeah - And I'm afraid to admit it but much as I like the guys in Melbourne - Gus knows better!

As Ron says - The indices built OK using idxbuild, its the ongoing stuff which is causing the problem
 

MurrayH

Member
Now the guys from Melbourne, as far as I seem to remember, did ask Gus.

toby.Harman said:
Yeah - And I'm afraid to admit it but much as I like the guys in Melbourne - Gus knows better!

As Ron says - The indices built OK using idxbuild, its the ongoing stuff which is causing the problem
 

ron

Member
THE RAMPANT DB GROWTH HAS STOPPED - after 15 days.

After prior Dump/Reloads the DB always grew for the two weeks after ... but not like the rate of this occasion.

But it has stopped.

We've left the records/block at 128 for all Areas. Notwithstanding the technical advice ... I still find it perplexing that it would matter.

Thanks to all who contributed.
 

ron

Member
I BELIEVE I KNOW WHAT CAUSED THIS PROBLEM

What wasn't mentioned earlier (sorry - I didn't initially realise the significance) ... was that at the same time as upgrading to Progress 9.1D ... we also did a Dump/Reload and increased the database block size from 2K to 8K. I believe that the change to blocksize was the "culprit".

A Dump/Reload requires all indexes to be rebuilt (of course) and they become tightly compacted. With a 2K blocksize ... during the week after the D&L normal index maintenance requires a large incidence of splitting of index blocks ... so the total database size grows quickly. After one week - and for about three more weeks - the incidence of splitting gradually decreases, and stops altogether at four weeks after the D&L.

The same basic thing happened at the time of the upgrade to 9.1D - because a D&L was done at the same time. However ... the blocksize is now 8K. Thus - when block-splitting happens - one split adds one more 8K block to the database ... whereas it was another 2K block previously.

The net result is that with the larger blocksize the growth in database size in the two weeks after the D&L is nearly double what it was with a 2K blocksize ... but the growth only happens for half as long. We get to a "steady state" more quickly.

Thus - the problem appears to have nothing at all to do with the upgrade to Progress 9.1D ... but was due to increasing the blocksize.
 

dje

New Member
Just a quick follow-up to Ron's last post:

The observed behaviour appears to have been caused by a combination of a (very) large table, an index whose active keyspace is randomly distributed throughout the total keyspace of the index, and the way that idxbuild works.

As previously noted, idxbuild packs indexes as tightly as possible. In a large table, that means the majority of index blocks are completely full.

This latter point is important because if an index block is full, if a key is added the block is split in two and, with one block's worth of keys now occupying two blocks (and thus approx. 50% free space in each block) the new key can be added. (Incidentally, with a very large index, the first key add after an idxbuild might result in a block split propogating all the way to the top of the index tree - for example in a 5-level index 4 splits might take place to add one key.)

Most tables have indexes defined such that the bulk of the index never changes. For example, with an index on transaction date, all the activity happens at one end of the index, and most of the index is never changed. Therefore there is no observable impact on database size after an index rebuild.

However if you have an index in part on, for example, customer number or customer name, such that when new records get added they might get added at any point in the index. If the new keys are randomly distributed throughout the index's key space, your index will very quickly grow in size as block splits occur.

A reminder that the circumstance which causes this is not common (but not all that unusual), and is exacerbated by the change in block size and by the sheer size of the table concerned (I think you'll find that the table concerned is about 50-70% of the total database size in question).

The problem could be ameliorated by allowing the specification of a maximum fill factor for the idxbuild algorithm, as the number of new keys is not large compared to the total key population of the index - simply specifying that a 90% fill factor is to be used would probably almost eliminate the post-idxbuild block splitting phenomenon. I do not know if there are already plans to put such a facility in place, however.

- David Eddy
 

dje

New Member
Records per block in index-only storage areas

Just another point, my $0.02 regarding index areas:

Personally I think that using 1 record per block is needlessly risky - unless you are planning to have your storage area exceed 256GB (for an 8kB block size) I frankly don't see that there's any upside at all, and there is a tremendous downside: If somebody accidentally puts a table into a 1rpb area then the space wastage will be extreme! It would put Ron's observed DB growth to shame.

So my recommendation to people is to use 64rpb for index areas unless you actually intend to exceed 64GB in the area (a safety factor of 4).

- David Eddy
 

ron

Member
dje said:
Just a quick follow-up to Ron's last post:

The observed behaviour appears to have been caused by a combination of a (very) large table, an index whose active keyspace is randomly distributed throughout the total keyspace of the index, and the way that idxbuild works.

- David Eddy

I agree with all of that, Dave.

It is a fact that the application we use has several indexes on all major tables (which are INDEED ... BIG). In general I am sure that Dave's observation is valid: on ONE index per table the index would remain very compact because records tend to only be added "at the end". But for each other index the new keys inserted would be highly random -- and cause the index to go from (close to) 100% full to about 70% within a short time (two weeks --- to be EXACT!).

Ron.
 
Top