Post upgrade DB Performance

Shanmugam T

New Member
Hi All,

I got downtime and performed dump & load successfully. Also migrated to Type 2 storage. there was no performance issue reported from user. Jan month end closing was peaceful :).


However i have challenges on resource part as well.

1) RAM utilization is almost 100% for past two months (shown in server performance report graph).
2) During peak load (month end), 11 CPU's were utilized.
3) Topas utility shows 50 to 60% of CPU were utilized by user processes.

Server Config: IBM AIX 6.1, 14 CPU's and 50GB RAM.

Total no of users: 1800 during monthend on server level


I have pasted db activity and buffer cache status below (15 minutes interval). Kindly suggest me do i still need to increase -B value for this database.

Apart from this database, we have few more databases running on the same server ( Two are 100 GB of size and rest all less than 15 GB of size ).


DB activity

Activity - Sampled at 02/06/15 14:17 for 107:52:22.

Event Total Per Sec Event Total Per Sec
Commits 21611991 55.7 Undos 63368 0.2
Record Updates 11794056 30.4 Record Reads 29865703177 76905.7
Record Creates 4125734 10.6 Record Deletes 2498914 6.4
DB Writes 1494455 3.8 DB Reads 188904951 486.4
BI Writes 1159342 3.0 BI Reads 117846 0.3
AI Writes 862839 2.2
Record Locks 125405719 322.9 Record Waits 461 0.0
Checkpoints 763 0.00 Buffs Flushed 5365 0.0

Rec Lock Waits 0 % BI Buf Waits 0 % AI Buf Waits 0 %
Writes by APW 96 % Writes by BIW 51 % Writes by AIW 90 %
Buffer Hits 100 %
DB Size 130 GB BI Size 6143 MB AI Size 49 MB
FR chain 2112 blocks RM chain 2 blocks
Shared Memory 4198M Segments 1

4 Servers, 447 Users (446 Local, 1 Remote, 0 Batch),5 Apws

-----------------------------------------------------------------------------------------------
Activity - Sampled at 02/06/15 14:31 for 108:07:01.

Event Total Per Sec Event Total Per Sec
Commits 21628552 55.6 Undos 63525 0.2
Record Updates 11802596 30.3 Record Reads 29937300983 76915.9
Record Creates 4129951 10.6 Record Deletes 2500505 6.4
DB Writes 1495715 3.8 DB Reads 189225870 486.2
BI Writes 1160689 3.0 BI Reads 117862 0.3
AI Writes 863697 2.2
Record Locks 125484396 322.4 Record Waits 462 0.0
Checkpoints 764 0.00 Buffs Flushed 5365 0.0

Rec Lock Waits 0 % BI Buf Waits 0 % AI Buf Waits 0 %
Writes by APW 96 % Writes by BIW 51 % Writes by AIW 90 %
Buffer Hits 100 %
DB Size 130 GB BI Size 6143 MB AI Size 9328 K
FR chain 2112 blocks RM chain 2 blocks
Shared Memory 4198M Segments 1

4 Servers, 450 Users (449 Local, 1 Remote, 0 Batch),5 Apws
-----------------------------------------------------------------------------------------------
Activity - Sampled at 02/06/15 14:50 for 108:25:20.

Event Total Per Sec Event Total Per Sec
Commits 21639443 55.4 Undos 63741 0.2
Record Updates 11812133 30.3 Record Reads 30035980761 76952.2
Record Creates 4133723 10.6 Record Deletes 2501902 6.4
DB Writes 1499017 3.8 DB Reads 189531990 485.6
BI Writes 1162059 3.0 BI Reads 117895 0.3
AI Writes 864585 2.2
Record Locks 125570571 321.7 Record Waits 464 0.0
Checkpoints 764 0.00 Buffs Flushed 5365 0.0

Rec Lock Waits 0 % BI Buf Waits 0 % AI Buf Waits 0 %
Writes by APW 96 % Writes by BIW 51 % Writes by AIW 90 %
Buffer Hits 100 %
DB Size 130 GB BI Size 6143 MB AI Size 18 MB
FR chain 2112 blocks RM chain 2 blocks
Shared Memory 4198M Segments 1

4 Servers, 449 Users (447 Local, 2 Remote, 1 Batch),5 Apws

--------------------------------------------------------

02/06/15 Status: Buffer Cache
14:17:38

Total buffers: 500002
Hash table size: 137743
Used buffers: 500002
Empty buffers: 0
On lru chain: 500001
On apw queue: 0
On ckp queue: 483
Modified buffers: -7652
Marked for ckp: 483
Last checkpoint number: 767

02/06/15 Status: Buffer Cache
14:32:15

Total buffers: 500002
Hash table size: 137743
Used buffers: 500002
Empty buffers: 0
On lru chain: 500001
On apw queue: 0
On ckp queue: 2111
Modified buffers: -7032
Marked for ckp: 2111
Last checkpoint number: 768

Enter <return>, R, P, T, or X (? for help):



02/06/15 Status: Buffer Cache
14:50:38

Total buffers: 500002
Hash table size: 137743
Used buffers: 500002
Empty buffers: 0
On lru chain: 500001
On apw queue: 0
On ckp queue: 218
Modified buffers: -7766
Marked for ckp: 218
Last checkpoint number: 768

-----------------------------------------------------------------------------------------------

Thanks everyone for helping me.
Awaiting for your response.
-Rgs
Shan
 

TomBascom

Curmudgeon
I got downtime and performed dump & load successfully. Also migrated to Type 2 storage. there was no performance issue reported from user. Jan month end closing was peaceful :).

Great news!

However i have challenges on resource part as well.

1) RAM utilization is almost 100% for past two months (shown in server performance report graph).

That is completely normal. Modern operating systems make a point of using all of your memory. To not do so would be a waste.

2) During peak load (month end), 11 CPU's were utilized.
3) Topas utility shows 50 to 60% of CPU were utilized by user processes.

Why do you think that these metrics are "challenges"?

Server Config: IBM AIX 6.1, 14 CPU's and 50GB RAM.

Total no of users: 1800 during monthend on server level

Ok. I don't know what the application is and obviously we know nothing about how much load each user might generate but 1,800 users in non-trivial. (Although there are only 447 shown below...) I would expect that you need a few CPUs and plenty of memory for that. 50GB is not much memory for an AIX server. 14 CPUs /might/ be a lot. But that depends a on which hardware this is -- p5? p6? p7? What's the clock speed? Is it virtualized? If it is how is the LPAR setup? etc...

I have pasted db activity and buffer cache status below (15 minutes interval). Kindly suggest me do i still need to increase -B value for this database.

Apart from this database, we have few more databases running on the same server ( Two are 100 GB of size and rest all less than 15 GB of size ).
Are all databases in the same LPAR?

Your data isn't very helpful -- that's not a 15 minute activity sample. It is 107 (almost 108) hours of activity data and 15 minute "samples" of status data (which is mostly static).

However -- you do seem to be doing almost 500 "db reads" (disk reads) per second over that 108 hours. That is quite a lot on average and could mean that you would benefit from increasing -B. You're only using 4GB of the 50GB that you say you have so there is probably room for increasing it and benefit from doing so.

DB activity

Activity - Sampled at 02/06/15 14:17 for 107:52:22.

Event Total Per Sec Event Total Per Sec
Commits 21611991 55.7 Undos 63368 0.2
Record Updates 11794056 30.4 Record Reads 29865703177 76905.7
Record Creates 4125734 10.6 Record Deletes 2498914 6.4
DB Writes 1494455 3.8 DB Reads 188904951 486.4
BI Writes 1159342 3.0 BI Reads 117846 0.3
AI Writes 862839 2.2
Record Locks 125405719 322.9 Record Waits 461 0.0
Checkpoints 763 0.00 Buffs Flushed 5365 0.0

Rec Lock Waits 0 % BI Buf Waits 0 % AI Buf Waits 0 %
Writes by APW 96 % Writes by BIW 51 % Writes by AIW 90 %
Buffer Hits 100 %
DB Size 130 GB BI Size 6143 MB AI Size 49 MB
FR chain 2112 blocks RM chain 2 blocks
Shared Memory 4198M Segments 1

4 Servers, 447 Users (446 Local, 1 Remote, 0 Batch),5 Apws
[/quote]
 

Shanmugam T

New Member
Hi Tom,

Yes it is LPAR setup, created logical partitioning using VIO server.

Hardware: P7+
Processor Clock Speed: 3500 MHz

Application : MFG/Pro eB2.1
Database: Openedge 10.1B
OS: AIX 6.1
Total RAM:50 GB
Total CPU: 14

We have six different site's database running on same server.
Total shared memory allocated is around 15GB for all 6 db's ( calculated from Promon > Activity).
Altogether we have around 1800 users sessions during month end.

I would like to know what is default memory size (RAM) allocated by progress to each client session. In prokb i see it is 10KB. Is that correct?

1800* (X) KB = approximate RAM size required for all 1800 users.

No of sessions.
uptime
03:41PM up 114 days, 3:41, 1575 users, load average: 7.11, 7.58, 8.36

As per you advice to increase -B value based on DB reads/sec. I would like to know what is ideal/good value for reads/sec.

Once again i pasted db activity data which is 37 hour's after weekly restart.

Activity - Sampled at 02/10/15 15:13 for 36:48:16.

Event Total Per Sec Event Total Per Sec
Commits 6092880 46.0 Undos 20274 0.2
Record Updates 4135450 31.2 Record Reads 12889651318 97283.3
Record Creates 1550409 11.7 Record Deletes 977705 7.4
DB Writes 512662 3.9 DB Reads 78582642 593.1
BI Writes 439026 3.3 BI Reads 67155 0.5
AI Writes 296917 2.2
Record Locks 33716975 254.5 Record Waits 187 0.0
Checkpoints 262 0.00 Buffs Flushed 1234 0.0

Rec Lock Waits 0 % BI Buf Waits 0 % AI Buf Waits 0 %
Writes by APW 96 % Writes by BIW 47 % Writes by AIW 90 %
Buffer Hits 100 %
DB Size 130 GB BI Size 6143 MB AI Size 46 MB
FR chain 2112 blocks RM chain 2 blocks
Shared Memory 4198M Segments 1

4 Servers, 476 Users (475 Local, 1 Remote, 1 Batch),5 Apws


Rgs
Shan
 

TheMadDBA

Active Member
If you have nmon you can use that to see a lot of useful information... if it isn't installed get it installed :)

For AIX 6.1 you need to pay attention to the values of minperm (min percentage of memory to use for AIX buffer caching). Available through nmon with the m option and also available/settable using the vmo command. Ideally you would want this value to be small (5% or so) but don't change it until you have a plan for the extra memory it will make available. Most likely it is trying to buffer DB blocks, just much slower than -B would.

Client memory usage depends on the startup options and your application. You can use ps auxw and look for the various Progress executables to see how much memory you are actually using per client. nmon also has a t option that shows processes sorted by cpu, io, memory, etc.
 

TomBascom

Curmudgeon
I must not have been clear.

Short (1 minute, 5 minutes, that sort of thing) activity samples, taken when the system is busy, are valuable. 100+ hour averages are not.
 

TomBascom

Curmudgeon
Is each database in a discrete LPAR? Or are they all combined together in a single LPAR? Or something in between? Are the LPARs "dynamic"? What is the entitlement?
 

Shanmugam T

New Member
I must not have been clear.

Short (1 minute, 5 minutes, that sort of thing) activity samples, taken when the system is busy, are valuable. 100+ hour averages are not.
I'll capture the data in coming month end when the system is busy.

Now i have posted below samples with 5 mins interval

Activity - Sampled at 02/25/15 12:57 for 58:32:57.

Event Total Per Sec Event Total Per Sec
Commits 9691756 46.0 Undos 31564 0.1
Record Updates 5477882 26.0 Record Reads 8478764236 40226.2
Record Creates 1731037 8.2 Record Deletes 1107029 5.3
DB Writes 667896 3.2 DB Reads 89032057 422.4
BI Writes 544128 2.6 BI Reads 80685 0.4
AI Writes 392212 1.9
Record Locks 47466634 225.2 Record Waits 222 0.0
Checkpoints 343 0.0 Buffs Flushed 2589 0.0

Rec Lock Waits 0 % BI Buf Waits 0 % AI Buf Waits 0 %
Writes by APW 97 % Writes by BIW 49 % Writes by AIW 89 %
Buffer Hits 100 %
DB Size 130 GB BI Size 6143 MB AI Size 22 MB
FR chain 2112 blocks RM chain 2 blocks
Shared Memory 4198M Segments 1

4 Servers, 516 Users (514 Local, 2 Remote, 0 Batch),5 Apws

---------------------------------------------------------------------------------------



Activity - Sampled at 02/25/15 13:02 for 58:38:02.

Event Total Per Sec Event Total Per Sec
Commits 9696403 45.9 Undos 31593 0.1
Record Updates 5480719 26.0 Record Reads 8480805485 40177.8
Record Creates 1733093 8.2 Record Deletes 1107705 5.2
DB Writes 668592 3.2 DB Reads 89053088 421.9
BI Writes 544519 2.6 BI Reads 80687 0.4
AI Writes 392533 1.9
Record Locks 47505431 225.1 Record Waits 222 0.0
Checkpoints 343 0.0 Buffs Flushed 2589 0.0

Rec Lock Waits 0 % BI Buf Waits 0 % AI Buf Waits 0 %
Writes by APW 97 % Writes by BIW 49 % Writes by AIW 89 %
Buffer Hits 100 %
DB Size 130 GB BI Size 6143 MB AI Size 26 MB
FR chain 2112 blocks RM chain 2 blocks
Shared Memory 4198M Segments 1

4 Servers, 513 Users (511 Local, 2 Remote, 0 Batch),5 Apws


---------------------------------------------------------------------------------------

Activity - Sampled at 02/25/15 13:07 for 58:43:17.

Event Total Per Sec Event Total Per Sec
Commits 9707911 45.9 Undos 31656 0.1
Record Updates 5485690 25.9 Record Reads 8503010160 40222.9
Record Creates 1735530 8.2 Record Deletes 1109303 5.2
DB Writes 669475 3.2 DB Reads 89193193 421.9
BI Writes 545139 2.6 BI Reads 80691 0.4
AI Writes 392940 1.9
Record Locks 47542470 224.9 Record Waits 222 0.0
Checkpoints 343 0.0 Buffs Flushed 2589 0.0

Rec Lock Waits 0 % BI Buf Waits 0 % AI Buf Waits 0 %
Writes by APW 97 % Writes by BIW 49 % Writes by AIW 89 %
Buffer Hits 100 %
DB Size 130 GB BI Size 6143 MB AI Size 31 MB
FR chain 2112 blocks RM chain 2 blocks
Shared Memory 4198M Segments 1

4 Servers, 510 Users (508 Local, 2 Remote, 2 Batch),5 Apws
 
Last edited:

Shanmugam T

New Member
If you have nmon you can use that to see a lot of useful information... if it isn't installed get it installed :)

For AIX 6.1 you need to pay attention to the values of minperm (min percentage of memory to use for AIX buffer caching). Available through nmon with the m option and also available/settable using the vmo command. Ideally you would want this value to be small (5% or so) but don't change it until you have a plan for the extra memory it will make available. Most likely it is trying to buffer DB blocks, just much slower than -B would.

Client memory usage depends on the startup options and your application. You can use ps auxw and look for the various Progress executables to see how much memory you are actually using per client. nmon also has a t option that shows processes sorted by cpu, io, memory, etc.
All OS parameters are set correct.
minperm% = 3
maxperm% = 90
 

TheMadDBA

Active Member
So the OS parameters are set "correct"... but what are the actually used values (nmon will show you this). Just because the min is 3% doesn't mean that you aren't wasting a huge percentage of your memory on AIX buffer cache instead of -B or other progress options.
 

Shanmugam T

New Member
So the OS parameters are set "correct"... but what are the actually used values (nmon will show you this). Just because the min is 3% doesn't mean that you aren't wasting a huge percentage of your memory on AIX buffer cache instead of -B or other progress options.

Uploaded nmon memory output. Please check and let me know ur suggestions. Thanks.
 

Attachments

  • nmon_mem_snap.JPG
    nmon_mem_snap.JPG
    52.1 KB · Views: 9

TheMadDBA

Active Member
So you are actually using (as of that snapshot) 13.3% of the memory for AIX buffer cache. You can use about 5GB of memory for other things before you start using up swap space.

Based on what your -B settings are I would guess that most of your memory is tied up in processes. You need to start investigating with the nmon t and ps auxw commands to find out where your memory is going.

What are the startup options for the progress clients/appservers/etc on the AIX box?
 

Shanmugam T

New Member
Attached snapshot taken from nmon t where are _progress are client process.

I have 6 MFG/Pro applications / db's running on this server with different startup / client parameters and different lang setup. We don't have any appservers or other services running.

Sum of -B of all 6 db's = 14 GB approx.

Sample client parameters.

-cpstream 620-2533 -cpinternal 620-2533 -cpcoll basic -cprcodein 620-2533 -d mdy -yy 1980 -Bt 350 -c 30 -D 100 -mmax 3000 -nb 200 -lkwtmo 180 -s 128 -rereadnolock -noshvarfix -p mf.p;

Thank you very much.
 

Attachments

  • nmon_top_process.JPG
    nmon_top_process.JPG
    103.6 KB · Views: 5

TomBascom

Curmudgeon
Having all of those databases in a single LPAR is probably not a good thing. Especially not a dynamic LPAR.
 

TheMadDBA

Active Member
Agree with Tom on the LPAR point.

All of those processes seem small but you are sorting based on CPU usage and not memory size (use the 4 option of nmon t to sort by size).

It is possible you just have tons of _progres processes that each take 20MB or so. You do have memory available and could try to increase -B but you really need to work on a measured approach of where your actual performance pain points are.

The LPAR setup is not ideal. You probably have application performance issues since it is MFG/PRO which is notorious for that. You might have disk/IO issues related to SAN/LPAR setup. etc etc etc.
 
Top