Slow database

TomBascom · Mar 17, 2014

PROMON summary screen averages over extremely long are pretty much useless.

Even so -- your hit ratio sucks. 19,000 record reads/sec and 500 disk reads per second is a hit percentage of around 97% -- that stinks. 500+ disk reads/sec over all that time is pretty high for a 47 user system. But you have 9.5GB of RAM being used to buffer a 27GB db so something is churning your data pretty hard. I suspect a lot of table scans are going on.

Why are there 75 servers and only 47 users?

After image extents are defined but PROMON seems to think that after-imaging is not running. That is very, very bad. You are putting all of your data integrity eggs in someone else's basket. In my experience non-Progress persons cannot be trusted to do it right.

"Slow" is bad but corrupt or missing is much, much worse.

http://dbappraise.com/ppt/ai.pptx

Congratulations! You have type 2 storage areas. However, judging by the names, they seem to be setup "functionally" which is less than ideal: http://dbappraise.com/ppt/sos.pptx

If you require failover time of a few minutes then you should have a hot spare available. If you want latency on the order of seconds or minutes then you need OE Replication. That works fine with, or without virtualized servers. You certainly aren't required to be virtual to support such a scenario. Yes, you can do such things with VMs but you pay a price for that. Not only in dollars but also in complexity and possibly in performance. Especially if you are not knowledgeable about VMWare (or whatever virtualization technology you are using).

Rob Fitzpatrick · Mar 17, 2014

What we have learned so far:

DB block size: 4 KB (default on DBs created on Windows; max is 8 KB; changing it requires dump & load)
BI block size: 8 KB (default; max is 16 KB; changing it requires DB shutdown/proutil -C truncate bi)
BI cluster size: 512 KB (default, which is too low; a better value would be 8 MB or 16 MB to start; changing it requires DB shutdown/proutil -C truncate bi)
Before image writer running? yes
Asynchronous page writers running: 2
Alternate buffer pool in use: yes
Largest blocks/cluster: 8
Large file support enabled: yes
Large index keys enabled: yes
64-bit sequences enabled: yes
Area records per block:
- 2 areas with 8 RPB
- 3 areas with 16 RPB
- 6 areas with 32 RPB
- 1 area with 64 RPB
AI block size: After imaging is not enabled. As Tom said there are AI extents in the database structure. So either someone disabled after imaging or it was never enabled at all after the AI extents were added. Either way it should be enabled as soon as possible. Before you do that I would suggest adding more after image extents to the four you have now.

Performance tuning helps to address this question: will my users have slow access to their data tomorrow?
Enabling and properly configuring after imaging helps to address this question: will my users have any access to their data tomorrow? Make this a priority.

Without knowing more about the data in your tables and where the tables are it is difficult to say that these values are incorrect. However I suspect that at least some of them are too low. Records per block does not matter for a storage area if it contains only indexes, not tables. If it is too low for an area that contains tables then that means that the database blocks in that area will be more empty than they should be. This in turn means that a given amount of logical I/O (e.g. record reads) will result in more physical I/O (disk reads) than is necessary. It also means that the memory you have allocated for the primary and alternate database buffer pools (with the -B and -B2 startup parameters; about 9 GB in total) will be used inefficiently. This can hurt performance.

I look forward to seeing your startup parameters (the extract from the database log file). Also if you can run a db analysis report, post it here. It is too large to copy and paste the contents into a post, but you could attach it as a text file. Don't run it during business hours. Like a backup, a db analysis report scans every block in the database. So application performance may be impacted while it is running.

TomBascom · Mar 18, 2014

I'm suspicious that the RPB settings are originally from the "avoid fragmentation at all costs" white paper. That calculation results in RPB settings that are much, much too low and trades vast swathes of disk space for a "benefit" that hasn't been useful since OE10 was released.

Bertrand · Mar 20, 2014

Hello everybody. I haven't forgotten that case. I'm just waiting tonight (my time) to make an db analysis. I will have a 2 hours time frame to do so. hope it will be enough.
What does RPB mean for ? (sorry, I'm a system architect, not a experienced DBA).

TomBascom · Mar 20, 2014

RPB = Rows Per Block

Bertrand · Mar 20, 2014

TomBascom said:
RPB = Rows Per Block

So, I understand that this is a unefficient way of oragnizing data. But can it result in poor performances ?

TomBascom · Mar 20, 2014

It looks suspiciously like the RPB settings are on the low side.

If confirmed (by comparing average row size shown by dbanalys) that would mean that each IO op is fetching less data than it should. Thus more IO is needed to do X work. Plus a lot of your memory would be being wasted -- so your buffer usage would be poor. Further aggravating your IO load. Both of those circumstances mean that things probably run slower than they would otherwise.

It is hard to say if it matters -- that depends on your workload and your user expectations. But you're posting about a slow database so it seems pretty likely that, if confirmed, it is at least a factor.

Bertrand · Mar 21, 2014

TomBascom said:
PROMON summary screen averages over extremely long are pretty much useless.

Even so -- your hit ratio sucks. 19,000 record reads/sec and 500 disk reads per second is a hit percentage of around 97% -- that stinks. 500+ disk reads/sec over all that time is pretty high for a 47 user system. But you have 9.5GB of RAM being used to buffer a 27GB db so something is churning your data pretty hard. I suspect a lot of table scans are going on.

Why are there 75 servers and only 47 users?

I frankly don't know...As far as I'm involved as infrastructure player, I did provide 200 TSE sessions load-balanced between 5 servers. So, for me, there is (as a maximum) 200 clients and a single application server. At the time when I did the extract with PROMON, only 47 users doesn't bother me but I have absolutely no idea of what a server is or can be, except for the DB server.

Especially if you are not knowledgeable about VMWare (or whatever virtualization technology you are using).

We are experienced in VMware and Linux LXC for more than 3 years. And we did install infrastructure dealing with DB, mostly ORACLE or SQL server but with smaller size (1st point) and with metrics (2nd point) allowing us to optimize the compromise between performance and reliability.

jurriaan · Mar 21, 2014

Bertrand said:
Hello everybody. I haven't forgotten that case. I'm just waiting tonight (my time) to make an db analysis. I will have a 2 hours time frame to do so. hope it will be enough.

Please note that you can always get the dbanalysis from a database on another machine, restored from your backup without any time constraints.
This can be as simple as this, provided you've transferred the backup file(s) to the new machine already:

- edit <insert database name here>.st to make sure the paths correspond to the layout on the test machine
- prorest <insert database name here> <insert backup file name here>
- proutil <insert database name here> -C dbanalys > analys.txt

Testing the restore of a backup is always good, and waiting for a time frame which may not be long enough can be frustrating.

Bertrand · Mar 24, 2014

Hello everybody...I got the dbanalysis.txt file from the command "proutil P:\Progress\Wrk\Pillaud\BD\evolubat -C dbanalys -Bp 10 > dbanalysis.txt"
I'm gonna upload the file as soon as I post it.

TomBascom · Mar 24, 2014

Things I notice:

1) You've got some very long RM chains for some objects (thousands, even tens of thousands of blocks). That's indicative of RPB being wrong (or the create/toss limits needing adjustment).

2) Mean records size within storage areas is very inconsistent. The names are in French and my French was never very good but it looks to me like objects in your storage areas are grouped by function rather than by the attributes of those objects. Attributes would be things like a common range of mean record sizes, activity level and things like that -- by function is "orders", "history", "ar", "ap", etc... By function is NOT the right way to do it.

3) You have a lot of large indexes with very poor utilization. A dump & load and/or index rebuild/compact might be helpful.

4) You do have tables with small rows that could benefit from larger rows per block.

5) Some tables are plenty large enough to justify a blocks per cluster value of 512.

Slow database

TomBascom

Curmudgeon

Rob Fitzpatrick

ProgressTalk.com Sponsor

TomBascom

Curmudgeon

Bertrand

New Member

TomBascom

Curmudgeon

Bertrand

New Member

TomBascom

Curmudgeon

Bertrand

New Member

jurriaan

New Member

Bertrand

New Member

Attachments

TomBascom

Curmudgeon