Question What Data Does _indexstat Provide?

Rob Fitzpatrick

ProgressTalk.com Sponsor
The size of the "idxCustomerCode" index on disk was significantly reduced. A table scan (e.g. for each customer use-index idxCustomerCode) would read 69.4% as many index blocks after compaction as before (2186622 / 3149652).

In production though you shouldn't specify a compaction factor of 100% as that will lead to block splits as record insertions and updates happen. That is expensive. If you specify a compaction factor of 80 then some free space will be left in the blocks for future growth, and you'll still have the benefits of compaction: improves cache hits and reduced physical I/O.

I would also test the performance of idxbuild to compare with idxcompact. You mentioned 11.6 documentation. Are you doing this work with 11.6 binaries?
 

BigSlick

Member
All benefits then!

I didnt specify any index compaction in test so it would have defaulted to 80.

Although idxbuild would probably be quicker, i think the business would be happier if it can be done online at off-peak times. But its something i will look into.

Currently we are on 10.2BSP07 but the test box i am using if for us to upgrade to 11.6 so everything i am doing is in 11.6. Not sure what you mean with 'binaries' though Rob :(
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
Not sure what you mean with 'binaries' though Rob
I just meant "what version of OpenEdge are you using". You answered that, thanks.

There are important improvements to idxbuild performance in 10.2B06 and later, though at least 10.2B07 on Unix or 10.2B08 on Windows is preferred due to memory allocation optimizations.

Two days would be more than enough time to rebuild those indexes.

Details from the PUG Challenge Americas conference:
slides: http://pugchallenge.org/downloads2013/261_Index_Rebuild_Perf.pptx
audio: http://pugchallenge.org/downloads2013/audio/261_Index_Rebuild_Perf.mp3
 

TomBascom

Curmudgeon
You can get a lot done on a long weekend. Especially if the hardware is any good.

But it takes a lot of testing and preparation if you want it to go smoothly.
 

BigSlick

Member
I'm just rerunning the idxcompact now to get timings via a script so i can persuade the powers to allow me to run this in live although the timings may not be as accurate as they've been compacted once, but then again the box isn't as powerful as live so maybe they will equal themselves out.

The majority of the indexes are in areas with RPB 1 BPC 64, with smaller indexes (not listed here) in RPB 1 BPC 8, would moving these to 1 and 512 for larger and 1 and 64 for smaller help?
 

Cringer

ProgressTalk.com Moderator
Staff member
When running on live keep a close eye on your AI files. They can grow quite considerably. I had to quadruple the frequency of the script that moves the archive files off the production system for backing up while it was running so we didn't run out of storage. Yes that was partly poor design on the box, but one to watch out for.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
The majority of the indexes are in areas with RPB 1 BPC 64, with smaller indexes (not listed here) in RPB 1 BPC 8, would moving these to 1 and 512 for larger and 1 and 64 for smaller help?
I tend to use 64 BPC for index areas. Do you really need 512 for large indexes? I guess that depends on how quickly they grow and whether you are writing within a fixed extent or growing the variable extent, and if so how often you are growing it. Depending on how frequently extends are happening it may not matter much.

I no longer use RPB 1 for index areas. In the old days (pre-10.1B) rowids were 32 bits long so they were a precious resource to be conserved. Now rowids are 64 bits so in practical terms, you'll never run out. That eliminates the one advantage of RPB 1. And of course the big disadvantage remains: if anyone is ever careless and adds or moves a table to an index area, the area will grow like wildfire, the blocks that store those records will be nearly empty, and performance will suffer. No, that shouldn't happen if you're careful. But people are fallible.

So as a safety measure I use RPB 128 for index areas (I use 8 KB DB blocks). And my periodic DB checks include a report on objects per area so it is easy to see issues like empty areas, mixed-use areas (tables and indexes together), objects in the wrong areas, and objects in the schema area.
 

BigSlick

Member
Thanks Rob. Sounds good to me.

I read a previous post about how the levels make a difference to the BPC, but unless we are about 6 levels 64 BPC should be more than sufficient.

I'm the one that checks the .df files before any load to live so the axe would fall with me if anything did appear in the wrong storage area!
 

BigSlick

Member
From the table above IdxItemCode kicked out 61billion reads in a 24 hour period.

Now that indexes have been compacted, will the reads lower now they have less blocks to look in. Or will the reads stay the same but the time taken to read the records improve due to the decreased amount of blocks it has to read?

Thanks
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
Now that indexes have been compacted, will the reads lower now they have less blocks to look in. Or will the reads stay the same but the time taken to read the records improve due to the decreased amount of blocks it has to read?
The logical reads won't change. To me, 61 billion reads in a day sounds like a lot. Of course, your database is larger than mine so maybe that number is reasonable given your users and their activity. Maybe. But I wouldn't be surprised if there are table scans or other poorly-bracketed queries (or entirely unnecessary queries) in the application. So you may want to dive into the _UserIndexStat data to see who is responsible for all those reads, and then look further into what code they are running.

The physical reads required to get those index blocks into the buffer pool will be reduced. Also, memory pressure on data in the buffer pool will be reduced, i.e. blocks already there won't be evicted as quickly. Both mean that caching efficiency will be improved. But heroic DB tuning will only go so far. Eventually you have to fix the code (assuming it's sub-optimal). :)
 

BigSlick

Member
I thought that was the case! To me 61 billion is a lot; our most is 149 billion! And that's on a table with 143,000 records. Yes there are issues. Finding them is what i plan to do next. Ive done an xref on all our code; it's just the small matter of going through it all! I'll look at the _UserIndexStat though and see if that helps.

All benefits then. Hoping to get a dump and load in the future as well as an upgrade to 11.6. Ideally i'd like the database to be smaller though for both of these; hence the index removals.
 

Cringer

ProgressTalk.com Moderator
Staff member
In my previous place we had a 200 record table that was the 4th most read in the application - and we're talking billions of reads a day. Horrible stuff. If you get onto a more modern Progress version you can at least move such tables to the secondary buffer pool so they are always buffered. But that doesn't fix the crappy code it just hides its impact some.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
Another way to reduce the size of the database is to purge unnecessary data prior to the dump and load. It will also make that process go faster. Do you have a data retention policy for your transaction tables? Depending on your industry, you may have regulatory guidelines on that. But keeping old data around indefinitely may be a legal liability in addition to being an operational burden. So it helps you and the company if you can get the business to set rules and stick to them.

To me 61 billion is a lot; our most is 149 billion! And that's on a table with 143,000 records.
That's a pretty small tables for that number of daily reads. Assign it and its indexes to the alternate buffer pool. You'll need a database restart to enable -B2 if you haven't already. And do the same for other relatively small and static objects that are very high in reads. And assign the schema area to ABP too while you're at it.
 

TheMadDBA

Active Member
61 billion a day is indeed quite a lot... especially for a table with that few records.

For a frame of reference: I ran a 4000 user system with a 2.5TB database and we didn't have total reads (all indexes) that high per day even with tons of reports and data extracts running. Unless the developers released some crappy code that is :)

Cringer and Rob gave you good advice on the short term fix (-B2). At some point the code will have to be fixed though.
 

TomBascom

Curmudgeon
Lots of good advice.

There does seem to be some confusion about a couple of things and some loose threads:

1) "levels" are not related to blocks per cluster. I'm not sure where you got that idea.

2) speaking of blocks per cluster -- for indexes this large 512 makes the most sense. using larger clusters would reduce the frequency of allocating clusters from the OS. I would not use 8 at all for this db -- it is much too big. and 64 seems fairly unlikely too. Unless you have an index with less than a few hundred blocks (none of those shown above) then 512 makes the most sense to me.

3) logical reads will be reduced by reducing the number of levels in an index. for instance, index 624 went from 5 levels to 4. That should correspond to a roughly 20% reduction in logical reads when using that index (the same query against the same data measured before & after the idxcompact). none of the others changed though so the overall reduction is probably small.

4) 61 billion.... yeah, that's crazy high. 9999 times out of 10 that kind of thing is driven by continual table scans against very small tables. put those tables in the alternate buffer pool. and set -lruskips and -lru2skips. you can probably find the culprits pretty quickly by simply looking at overall table stats rather than individual table stats. protop will help a lot too. in the end it is going to be a matter of fixing the crappy code -- good luck with that ;)
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
"levels" are not related to blocks per cluster. I'm not sure where you got that idea.
I don't remember if I was responsible for that item (I don't think I was, but I'm a quart low on caffeine at the moment) but as I recall from a discussion with Rich where he recommended (at least) 64 BPC for indexes the justification went like this:
  • For a given object, there will be on average half a cluster's worth of blocks free below the area HWM. In an 8 BPC area, an average table will have 4 free blocks available.
  • An index key insert at the leaf level may necessitate a block split and so an update to the block at the level above.
  • That intermediate-level update may also cause a block split, etc. Worst case, the block splits and updates propagate up the B-tree to the root. So that one logical update could necessitate the allocation of a number of new index blocks equal to (index levels - 1). If the index has 6 levels, that logical index update could add 5 blocks to the B-tree.
  • If there aren't 5 free blocks available, it is necessary to grow the area HWM by one cluster, which at a minimum involves formatting a cluster's worth of blocks and updating some chains, but could also result in waiting on a file system I/O to grow the file if the HWM is in the variable extent.
  • In practice, you probably won't see indexes with more than 6 levels. So BPC 8 could be a problem fairly often in this scenario (4 blocks free on average) whereas BPC 64 won't (32 blocks free on average).
So that's kind of a correlation between BPC and index levels, if you squint. ;) All that said though, I'm not disagreeing with you. :) I don't imagine that worst case is a common scenario.
 
Top