Serious performance problems

TomBlue999

New Member
Hello Community!

I know this is maybe thread number 1000 concerning performance, but I have a problem I cannot cope with.

There is a major part in our software (creating orders) where runtime is from about 2000ms to 30 seconds and more. It’s not possible to reproduce, in any test I did, I reached a maximum of 3000ms. Of course one solution would be to redevelop this part of software (it woukd be worth to do so ;)), but this is about 3-6 month of work, we do not have yet. On the other hand it is very strange that the gap between maximum and minimum runtimes is this high.

Server: Windows 2003, SP2, 3GB RAM, RAID 10, 1x XEON 2GHz (dual core), Progress 10.1A Enterpr., 2 APW, BI and AI Writer, -B 25000 (still about 1GB of free RAM), about 60 active users (-n 76), max 4 Clients per server, max. 19 servers
Client: 2x Windows Windows 2003 Terminal server, well equiped

What I did until now:
  • Checked server performance with protop.p: There are no problems with physical reads and/or flushed buffers of checkpoints while user had to wait.
  • Tried to tune client parameters (-TB, -TM, -mmax), no differences detected
  • Checked client statistics (-y), it do not seem that there are problems with temp I/O performance
I did also try to run readprobe.p. Because I got very good values for the server, I tried readprobe.p also in client/server configuration with 100 users. One remarkable value I recognized is that network utilization never have been more than 4% (of 2 Gbit, 2 teamed HP adapters). I do not have any networkproblems with other applications. File copy between the same servers is working perfect!

Does anybody had any problems like this, especially with networking?
Does anybody has good experiences with tuning message buffer size parameter (-Mm)?
Any other ideas?

Thanks
Thomas
 
Just curious... have you checked to see if you are swapping on the server using task manager or one of the other memory tools out there?
 
Offhand I do not see -spin in your startup parameters and -B 25000 is really quite low.

Increasing -Mm can be beneficial but you must do it everywhere and, so far, there isn't anything in what you've said that suggests that it would be relevant.

You might want to seriously consider engaging an experienced performance consultant to come in and take a look ;)
 
Your delay may be related to activity on the windows server. Have you considered using perfmon to track the server's resources.

Look at the db checkpoints - do they line up with the app delays ?
 
When we've had that kind of delays it normally is the disk or the cpu running at 100%.

Seeing that your -B is very small, I suspect you must read most of your data directly from disk and not from memory (unless you have a very small database). Assuming a 8K block size 25000 blocks is about 195MB of buffer cache. That could put a lot of strain on your disks, depending on how many disks you have in your raid 10 configuration and if you separated the BI and AI from your database files.

I would start with perfmon to check the disk and cpu.
 
Ok, guys, thank's a lot. I'm going to do:
  • I try to increase -B (Although I cannot see that this is the problem, buffer hit rate is almost 99,32%, there are not extensive OS reads in particular cases when users have to wait)
  • Trying to experience with -spin parameter (is at default: 12000)
  • Going in deep with perfmon. At a first glance there are no swapping problems, no 100% CPU or disk usage and so on.
@cj_brandt: I guess when there are no flushed buffers at checkpoint, it is OK, isn't it?

But, does anybody know why there is only 3% Network Load (of 2GBit) when starting 100 users (readprobe.p, just reading) in client/server mode? This seems strange to me.

Many Thanks
Thomas
 
A performance goal for checkpoints is to be at least 1 or 2 minutes apart with 0 buffers flushed during peak activity.
 
The network isn't the bottleneck when readprobe is running. The bottleneck is intended to be the latch mechanism inside the db code.
 
Back
Top