Question How to calculate the best number of records per block for Storage Areas

#1
Hi everybody,

according to this Knowledgebase-Article: Knowledge Article i try to figure out the best RPB-value for our tables and I have one table, where I would like to have your opinion. The characteristics of this table are as follows:

Table Records Size Min Max Mean
PUB.fibuc 14373100 2.4G 114 1381 174


As we are on Windows, we use the default-blocksize of 4K, which means that 22 records would fit into one block: (4096 - 100) / 174

if I chose 32 RPB, I would waste 10 rowid-slots per block, which would be ~6.500.500 slots
if I chose 16 RPB, I would waste (almost) no rowid-slots, but there could be a lot of empty data-space in each block.

The thing is, that the records in this table never are updated, so they can not grow and this "data-space" is wasted for ever.

Which solution would you guys choose ?

TIA, Wolf
 

RealHeavyDude

Well-Known Member
#2
For one - in recent version of OE11 you should be fine with 8K block size on Windows too. There are old versions of OE10 and Progress V9 in which you could end up with a silently corrupted database. You still could end up with a corrupted database in more recent versions of OE11 but it won't go unnoticed. I've never heard of any instance where setting the DB block size to 8K on the Intel platform ( Linux / Windows ) where the blocks size is 4K caused any problems.
Maybe others have.

IMHO - I would only care wasting record slots when I would be in danger of reaching the limit of total records per table/storage area/database, which, in the past, only was an an issue when that limit was determined by the 32Bit database. I'll assume that you are not running an old 32Bit database and therefore I would care more for the wasted space - especially in memory.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
#3
I agree with RHD: "wasted rowids" was a potential problem when setting a high RPB, prior to 10.1B when rowids were 32 bits long. Now that they are 64 bits it shouldn't be a concern (whether you run 32-bit or 64-bit OpenEdge). If you had a table that was so massive that you thought you could run out of 64-bit rowids in a storage area then you should be running Table Partitioning where one table can be partitioned across multiple storage areas.

I also wouldn't follow the advice in that article with regard to calculating your blocks' actual RPB. The block overhead is not exactly 100 bytes. The structure of an RM (data) block goes something like this (caveats to follow):
  • Block header: 64 bytes (80 bytes for the first or last block in a cluster)
  • RM header: 4 bytes
  • Row directory: variable (2 bytes per row)
  • Create limit: 150 bytes of empty space (by default, can be changed)
  • Remaining contiguous empty space
  • Rows
Caveats:
  • I assume you're using Type II storage areas.
  • I'm not an internals expert. There might be some subtle things I'm missing or have wrong, but I'm fairly confident that the above is correct.
If you are sure that the records in this table are never updated so you don't have to worry about fragmentation, and you want the data to be packed as efficiently as possible, then:
  • Use 8 KB for your database block size (FWIW I use 8 KB on all platforms).
  • Use proutil dbname -C settablecreatelimit tablename create-limit to reduce the "wasted" space reserved for updates. The minimum value is 32 bytes. Given your mean record size that might buy you one more record per block if you're lucky.
  • Use 512 for this table's storage area blocks per cluster.
  • Use 128 for this table's storage area records per block (whether you use 8KB or 4 KB blocks). I explain why below.
The reason why people go to lengths to calculate a value for RPB is to minimize the effect of making a bad choice. If you choose a value that is too low, you will hit the RPB limit before the block is full. The effects are wasted empty space in the blocks that could have stored more records, and therefore poor caching efficiency when reading the data, as well as the area being larger on disk than it needs to be. If you choose a value that is too high, the result could be excessive record fragmentation, if the data access pattern for the table is such that records are written to fill a block and they are later updated to increase their size. When the row directory is full and the size increase from the updates exceeds the create limit, there is nowhere left to write the new data so the record(s) must be fragmented across multiple blocks. This also has a negative performance impact when later reading these records, due to more disk I/Os and reduced caching efficiency.

But in the case where you state you aren't concerned about record growth after the create, choosing too high for RPB has no negative effect. You only have to choose an RPB that won't waste space in any blocks that could have been used for creates. The worst case would be an 8 KB block that contains your smallest records (114 bytes).

8192 (block size)
- 68 (headers)
- 32 (minimum create limit)
= 8092 (space for records)

8092 / 116 (minimum row size + 2 bytes per row for row directory entry)
= 69

So if you chose 64 RPB then in theory, you could maybe have a few blocks that contained slightly fewer records than is possible. If you chose RPB 128, that wouldn't be possible even in theory.

Closing thought: 2.4 GB isn't terribly large for a table. I could see you worrying about it if it were a 2.4 TB table, but all of this won't make a tremendous amount of difference unless you expect this table to grow a lot larger, or you intend to read the data quite a lot.
 
Top