Database buffer -B change slow downs QAD programs

Ernestas

New Member
Hello, so a customer of ours, has QAD 2013 Standard Edition. Their system is gradually slowing down, and because of that, i did some investigation. So i found out that the rule of thumb is -B parameter should be 10-15% of database size. Customer's database is 230GB now, and their buffer size was 2.4GB. I increased buffer size (tweaking -B parameter) to 15GB and their system started to work 2x slower. When checking resources, i noticed that the QAD itself does not fully use the buffer, it only uses like ~1.8GB (meaning 2.4GB was enough). I did some research on the interned but unfortunatelly i could not find anything that would help, so as my last resort im posting here. If needed, i can attatch all the data needed (later).

Progress version is 10.2b

Thanks in advance for help and insights.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
Database buffer -B change slow downs QAD programs
Do you have evidence that the slow downs were caused by, and not merely coincident with, the configuration change? Do you know with certainty that nothing else changed at that time, either in the database or in the wider environment? E.g. OS, network, server workload, virtualization, storage, etc. Don't conflate correlation with causation.

So i found out that the rule of thumb is -B parameter should be 10-15% of database size.
This is outdated guidance. The appropriate size of the buffer pool for a given database instance really depends on how the client applications are written and how the users behave, in addition to the database size. The database size puts an upper bound on the appropriate buffer pool size. If a database is very large but is mostly historical data, and the users only read very recent data, then the working set of the application, i.e. the data that is cached in the buffer pool as a result of user activity, may be quite small in comparison.

But if clients read a lot of data then the working set of the applications is much larger and the database and its users could benefit from a much larger buffer pool. Note that this can happen even if the users aren't attempting to read large quantities of data, for example if the queries are poorly written by the programmer, or if the schema doesn't provide appropriate indexes to match the query predicates. Business users given free rein to run ad hoc queries in a BI tool are also a common cause of this.

Promon R&D, 1, 7 shows you the current status of the buffer pool. Example:
Code:
07/27/22        Status: Buffer Cache
10:06:22

Total buffers:                     3002
Hash table size:                   887
Used buffers:                      33
Empty buffers:                     2969
On lru chain:                      3001
On lru2 chain:                     0
On apw queue:                      0
On ckp queue:                      0
Modified buffers:                  0
Marked for ckp:                    0
Last checkpoint number:            0
LRU force skips:                   0
LRU2 force skips:                  0

Enter <return>, R, P, T, or X (? for help):
Promon R&D 2,3 shows buffer pool activity and related metrics, including OS reads (reads of a database block from disk).
Code:
07/27/22        Activity: Buffer Cache
10:12:54        07/27/22 10:06 to 07/27/22 10:06 (5 sec)

                                    Total         Per Min          Per Sec          Per Tx

Database Buffer Pool
Logical reads                         494            5928            98.80            0.00
Logical writes                          0               0             0.00            0.00
O/S reads                              26             312             5.20            0.00
O/S writes                              3              36             0.60            0.00
Checkpoints                             0               0             0.00            0.00
Marked to checkpoint                    0               0             0.00            0.00
Flushed at checkpoint                   0               0             0.00            0.00
Writes deferred                         0               0             0.00            0.00
LRU skips                               0               0             0.00            0.00
LRU writes                              0               0             0.00            0.00
APW enqueues                            0               0             0.00            0.00
Database buffer pool hit ratio:  94 %
This information won't be too meaningful until the database has been up for a while and the users have fully exercised their typical data-access patterns. Note that some of these may be periodic and infrequent, e.g. running weekly/monthly/quarterly reports or updates.

A better rule of thumb is this: given a starting point of your current buffer pool size and your logical and physical (OS) I/O, if you restart your database with a larger buffer pool size and the amount of physical disk reads decreases, given the same logical I/O, then that increase in buffer pool size was beneficial for you. Note that you will not see differences with small increases in -B. Trying doubling -B when testing this, assuming of course you have sufficient available memory.

Remember that the buffer pool is in RAM, so higher utilization means more of the system's memory in use, so more isn't always better. Increasing database memory utilization could in some cases decrease database performance, e.g. if the increase in -B causes increased OS paging, and thus increased disk I/O, or if the increase causes database shared memory to span multiple NUMA nodes. You will need to experiment with different -B values to see what works best with your OpenEdge release, OS, hardware, database, and workload.

Progress version is 10.2b
OpenEdge 10.2B was released thirteen years ago and has been retired for three years. It will never received new OS certifications or new patches for performance, reliability, data protection, or security. The longer your customer uses it, the greater their business risk. They should have a plan in place to migrate to an active OpenEdge release, e.g. 12.2.
 

Ernestas

New Member
Hello Rob Fitzpatrick,

Thank you for your reply and your help. Below i attatch some data screens, answers and other observations.

Do you have evidence that the slow downs were caused by, and not merely coincident with, the configuration change? Do you know with certainty that nothing else changed at that time, either in the database or in the wider environment? E.g. OS, network, server workload, virtualization, storage, etc. Don't conflate correlation with causation.
The change was done in Development server, where no workload is present. Then the changes were applied to PROD environment, in which certain programs showed up to take 2x more time to process. Then testing with certain programs was done in DEV server, the programs behaved the same - took 2x more time to process than usual. Both, DEV and PROD server -B parameter changes were reverted, and both environments started working as before (meaning no programs took more time to process than before the changes). I am pretty confident, that thte slow program processing was directly impacted by -B change, both servers behaved the same after changing the -B parameter.

Below i attatch some of the current data seen in PROD server (which is on regular load).
1659335231670.png1659335272641.png1659335329758.png

Note that you will not see differences with small increases in -B. Trying doubling -B when testing this, assuming of course you have sufficient available memory.
I tried changing -B, doubling it, programs appear to work slower (not 2x, but noticeably slower). Also, what i noticed, when i increase -B parameter, when programs start to work slow. Unfortunately i dont have a screenshot anymore, but i noticed, when i change -B, reads per second starts to grow (when i run those specific programs, i did comparison: restart server, launch programs, observe, repeat that after changing -B from 2GB to 20GB). When the program started working 2x slower, i observer that reads (logical) increased in size about 2x also. I cannot find info on how its related to -B.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
What bit is your Progress Openedge database, 32-bit or 64-bit? If 32-bit then that would explain it, you probably see lot of IOs.
With over 2.5 GB of shared memory in use, and larger values used in testing, this must be 64-bit OpenEdge RDBMS.

The change was done in Development server, where no workload is present. Then the changes were applied to PROD environment, in which certain programs showed up to take 2x more time to process. Then testing with certain programs was done in DEV server, the programs behaved the same - took 2x more time to process than usual. Both, DEV and PROD server -B parameter changes were reverted, and both environments started working as before (meaning no programs took more time to process than before the changes). I am pretty confident, that thte slow program processing was directly impacted by -B change, both servers behaved the same after changing the -B parameter.
We need to know more about your hardware. Total memory available to the OS, whether this is a NUMA system and if so, the number of NUMA nodes and amount of memory local to each. It may also be relevant to know virtualization details, if applicable. It is possible for a VM to use more memory than is physically available from the host, e.g. thin provisioning. Again, using more memory isn't always better, given the available resources.

I tried changing -B, doubling it, programs appear to work slower (not 2x, but noticeably slower). Also, what i noticed, when i increase -B parameter, when programs start to work slow. Unfortunately i dont have a screenshot anymore, but i noticed, when i change -B, reads per second starts to grow (when i run those specific programs, i did comparison: restart server, launch programs, observe, repeat that after changing -B from 2GB to 20GB). When the program started working 2x slower, i observer that reads (logical) increased in size about 2x also. I cannot find info on how its related to -B.
If you haven't reached some performance-throttling boundary like growing database shared memory beyond a single NUMA node, then all things being equal I would expect a larger buffer pool to result in a higher rate of logical reads (reads/second), because you have a greater chance of cache hits, and also perhaps a higher ratio of logical to physical reads. I would not expect it to have any effect on the number of logical reads, as that is determined by the client's activity: how the code is written, which indexes were selected by the compiler to resolve queries, which code paths are followed based on user input. So if you saw an increased in the number of logical reads after increasing -B, as I think you are saying, that makes me wonder about your testing methodology. Are you reverting the database back to its initial pre-load state before each test? You may even need to restart the test system prior to each test, to ensure the file system buffers are in the same state from one test run to the next.
 

dbacdog

Member
with a 32-bit 2 GB is the limit, however, you can get little more then 2 GB, i had a 32-bit OE RDBMS and was able to get little more then 2 GB...

If Port Number = 31 then your RDBMS is 32-bit.
DLC/bin/showcfg DLC/progress.cfg

EXAMPLE 3:

Serial Number: 123456789
Control Numbers: XZARS - YPH3M - 2MCY6
Version Number: 10.2B
Machine Class: EB
Port Number: 31
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
with a 32-bit 2 GB is the limit, however, you can get little more then 2 GB, i had a 32-bit OE RDBMS and was able to get little more then 2 GB...

If Port Number = 31 then your RDBMS is 32-bit.
DLC/bin/showcfg DLC/progress.cfg

EXAMPLE 3:

Serial Number: 123456789
Control Numbers: XZARS - YPH3M - 2MCY6
Version Number: 10.2B
Machine Class: EB
Port Number: 31

after changing -B from 2GB to 20GB
;)
 

Tarby777

Member
Are you explicitly setting the hash table size with the "-hash" broker parameter? I've seen systems where performance dipped when the buffer pool was made bigger and the hash table wasn't big enough for it. I'll spare you the detail unless we find out that you ARE sizing it manually... suffice to say that it's frequently best to remove any "-hash" values in your startup scripts, and let Progress size it automatically.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
Are you explicitly setting the hash table size with the "-hash" broker parameter? I've seen systems where performance dipped when the buffer pool was made bigger and the hash table wasn't big enough for it. I'll spare you the detail unless we find out that you ARE sizing it manually... suffice to say that it's frequently best to remove any "-hash" values in your startup scripts, and let Progress size it automatically.
Good call. Always let the broker choose the value of -hash.

It is worth noting here that increasing the size of the buffer pool online, via proutil -C increaseto -B, is not functionally equivalent to restarting the database with that larger -B value. The latter increases the size of -hash accordingly (assuming it isn't set explicitly, of course), the former does not. And -hash cannot be increased online via proutil or promon.
 

TomBascom

Curmudgeon
It is also worth noting that if you are using OpenEdge Exploder to adjust -B that it may be silently NOT allowing -hash to be chosen by the broker and is, instead, using an old value that was set to go with the previous, much smaller, -B.

Look in conmgr.properties. If hashtableentries does not equal zero then a fixed value is being used, and not recaclulated if -B is changed.

Better yet - don't use OpenEdge Explorer to manage db startup parameters. If you must use OEE put a .pf file in the otherArgs parameter and use that to properly set db startup parameters.

And another thing...

In the descriptions of the tests above it sound likes parameters were changed and a couple of test programs were run _once_, in series, with a "cold cache". This is not indicative of what users will see and is very deceptive. You can easily see poor test results if you do this because the benefits of the larger cache are not available until it has been populated. In "real life" users are not working in isolation with a cold cache. They are working with a cache that is populated and which is being shared with lots of other users.
 
Top