Resolved Memory usage for DB buffer pool in RHEL 7.8 system running on vSphere 6.5

RealHeavyDude

Well-Known Member
OpenEdge 11.7.6
RHEL 7.8
vSphere 6.5

Maybe somebody can help me to understand this:

Our production system is running on a RHEL 7.8 system which is hosted on vSphere 6.5. The VM is configured with 128 GB memory and 4 CPUs.

The dababase is 500 GB large, blocksize 8K and the buffer pool is

[2020/09/16@00:00:00.146+0200] P-1052011 T-140343696860992 I BROKER 0: (4239) Number of Database Buffers (-B): 4200000
[2020/09/16@00:00:00.146+0200] P-1052011 T-140343696860992 I BROKER 0: (17562) Number of Alternate Database Buffers (-B2): 50000

Unless I made a dumb error, from my calculation this means that the database should consume some 34 GB of memory.

Now the capacity management tells me that this system is over sized as on average it only consumes some 10 GB of memory and they recommend to down size the memory to 32 GB and I would need to justify why the system can't be down sized.

Nevertheless, I don't understand why that RHEL system would only show 10 GB memory consumption on average. From my understanding the buffer pool is allocated at database startup in its entirety. If that is the case then I would think that the RHEL system should show a memory consumption on average which is more in 50 GB range.

Does anybody have an idea what could cause this discrepancy or seen something similar?

Thanks in Advance.
 
From my understanding the buffer pool is allocated at database startup in its entirety.
It appears to me that Linux allocates all of shared memory up front, but different tools (top vs. htop vs. nmon) calculate things like resident set size differently and it's not clear to me which is correct. I suspect the OE broker's memory-allocation behaviour is going to be OS-specific and I'm really not sure what the specifics are. Maybe opening a case with TS can get you access to an engineer who can explain what's really going on.

Unless I made a dumb error, from my calculation this means that the database should consume some 34 GB of memory.
You have 32.4 GB of buffer pool (4,250,000 * 8192 / 2^30), which is most of shared memory but not all. Depending your schema and startup parameters, total shared memory will be somewhat more than that, maybe 10% more roughly. So you're in the right ballpark. You should see the total in promon 5.

If that is the case then I would think that the RHEL system should show a memory consumption on average which is more in 50 GB range.
Just curious, does this estimate include memory utilization of other processes, like process-private memory of brokers, servers, background processes & utilities, and server-side clients?

Now the capacity management tells me that this system is over sized as on average it only consumes some 10 GB of memory and they recommend to down size the memory to 32 GB and I would need to justify why the system can't be down sized.

I've had customers fight this battle with their IT people: "you don't appear to be using it now, or all the time, so I'm taking it away." A database server's memory shouldn't be reserved based on some observed average, any more than a 12-lane highway should be narrowed because sometimes the traffic is light. If the allocated resources will be needed when demand is heavy then they are needed, period. Trying to save little bits of memory here and there, and penalizing LOB applications in the process, is a false economy for the organization. But I know I am preaching to the choir here.

I think it would be helpful if they worked in collaboration with you rather than just delivering ultimatums. For example can they explain their methods for data collection, like:
  • How do you measure memory consumption? Which tools and which metrics in those tools?
  • When is the data collected and how often?
  • At what level is the data collected? E.g. per-process, overall at the OS level, or in the hypervisor host at the VM level?
Virtualization adds another level of complexity. Just as OSes can play optimization games with sharing memory pages between processes, hypervisors can now do similar things to share pages between within and between guest VMs (e.g. VMWare Transparent Page Sharing). So for example if you had a prod DB VM and a replication target VM in the same cluster, they might share lots of memory pages. If so, does a shared page get counted in the memory budget of both when it's only backed by one physical page? We don't have the answers to these questions; you would have to try to work with your capacity management people to find out.

For your part, do you know the behaviour and history of your DB? For example is your 32 GB buffer pool typically always full (promon R&D 1,7)? If so, do you know how long it takes to fill after a DB restart? Did you have downtime recently? Is it possible that the CM people took measurements just after a restart when the cache was cold? This ties back to the earlier question of how OE actually does shared memory allocation in the OS.

Sorry RHD that I have more questions than answers. Let us know what you find out.
 
You might also want to double check that the shared memory you allocated is actually being used. ProTop shows "SHM Free" on the configuration scree. Or you can pop into PROMON and check R&D, 1, 7 "empty buffers". If you have empty buffers nothing has been read into them yet and they are being "wasted" (unless you know that they will eventually be used).
 
Thanks for your response - finally I was able to get some operator to provide me the promon screens:

PROMON 5
Code:
Activity  - Sampled at 09/17/20 11:03 for 89:12:17.

Event                 Total   Per Sec   Event                 Total   Per Sec
       Commits    446767173    1391.2            Undos         6537       0.0
Record Updates    133259121     415.0     Record Reads  30903992547   96233.0
Record Creates     47012575     146.4   Record Deletes     26119836      81.3
     DB Writes     21241360      66.1         DB Reads    409411146    1274.9
     BI Writes     10749883      33.5         BI Reads      1484189       4.6
     AI Writes     10748337      33.5
  Record Locks   3984204726   12406.6     Record Waits           19       0.0
   Checkpoints          668       0.0    Buffs Flushed         4024       0.0

Rec Lock Waits    0 %    BI Buf Waits      0 %    AI Buf Waits      0 %
Writes by APW    99 %    Writes by BIW    89 %    Writes by AIW    99 %
Buffer Hits      99 %    Primary Hits     99 %    Alternate Hits  100 %
DB Size        1033 GB       BI Size    8203 MB       AI Size     433 MB
FR chain                   560 blocks   RM chain                    9 blocks
Shared Memory  35122M        Segments      2

80 Servers, 103 Users (7 Local, 96 Remote, 6 Batch),2 Apws

RETURN - repeat, U - continue uninterrupted, Q - quit:
PROMON R&D 1 7
Code:
09/17/20        Status: Buffer Cache
11:04:47

Total buffers:                     4250002
Hash table size:                   1452281
Used buffers:                      4201898
Empty buffers:                     48104
On lru chain:                      4200000
On lru2 chain:                     50000
On apw queue:                      0
On ckp queue:                      8419
Modified buffers:                  8101
Marked for ckp:                    8419
Last checkpoint number:            673
LRU force skips:                   100
LRU2 force skips:                  100

Database Buffer Pool
Active Blocks                      4201898
Master Blocks                      1
Index Blocks                       360241
Record Blocks                      3839868
Free Blocks                        941
Sequence Blocks                    1
Area Blocks                        0
Object Blocks                      845
Object List Blocks                 0
Control Blocks                     1
Cluster Map Blocks                 0
Obj Clist Blocks                   0
Block Map Blocks                   0

Enter <return> for more, R, P, T, or X (? for help):

What I see from that is the we have only 1% of empty buffers and db actually uses some 35.1 GB of memory. Therfore I would think that the buffer pool is not really over sized.

Nevertheless, the internally developed software to monitor the systems shows peaks of some 80 GB but an average usage of only 11 GB. At least the result from promon gives me ammunition to challenge their reports. It really strikes me how they come up with an average usage of 11 GB when the DB usually runs 24x7.

Thanks and best regards.
 
Obviously I know very little about this specific application but it isn’t uncommon for a lot of activity to be focused on a relatively small portion of the data. So it could be that a lot of those buffers are somewhat rarely used. If they have someway to see that then perhaps that’s what they are claiming?

One of ProTop’s interesting metrics is “churn”. The ratio of record reads to table size. It is very common for fairly small tables, sometimes with just a handful of records, to be the source of the bulk of data access. In cases like that you might have a huge -B but only a few blocks are terribly busy. The rest are only lightly used from time to time.
 
To me, measuring average memory consumption on a DB server doesn't make sense.

That said, I really think your key to success will be to be able to get these numbers in real time, and OS metrics as well, while this internally-developed software is collecting its stats, so you are comparing apples to apples.

Also, 32 GB of cache for a 1 TB DB on a machine with 128 GB of RAM seems fairly conservative, if this DB is the VM's only workload. Have you experimented with larger -B to see the effect on cache hit ratio and empty buffers?

I'm also wondering about those 48,104 empty buffers. It looks like this DB has been up for less than 4 days. Unfortunately from R&D 1,7 we don't know which buffer pool the empty buffers are in, though from R&D 2,3 ABP OS reads we could get more insight. Possible scenarios:
  • Maybe the empty buffers are all in -B and with more time, these buffers will be used completely, meaning you might benefit from larger -B.
  • Maybe the empty buffers are all in -B2, so only 1,896 out of 50,000 of them are currently in use. If so, you may want to verify that the assignments are what you expect.
  • Maybe you've cached the working set of this DB and these buffers will remain empty, so larger -B wouldn't help. Given this DB's size and apparent growth rate, I doubt this very much.
I suspect it is still too soon after the last restart to reach any conclusion about those empty buffers. I suspect that after more activity, which could include infrequently-run code and its data access activities for end of week/month/quarter, they will be filled.
 
Back
Top