P
Paul Koufalis
Guest
- We know it's not NUMA (single-processor with 10 cores per www-01.ibm.com/.../ssialias ). - We know it's not disk I/O (your disks are doing nothing) - If it was a memory/swapping issue I'm certain you would have seen it, so let's rule that out. - You say the server is not CPU starved but is the batch process single threaded? Even at that, the new cores should be faster than the old cores. - What's left? Kernel calls? Weird nice levels? New Progress issue with p8? I had a very similar issue going from p6 to p7 and it turned out to be a UNIX SILENT chmod that was run half a million times. I'm not saying this is your issue, but it's definitely time to think outside the box. 1. You said you think it's a whole string of jobs run one-after-the-other. Find out how long EACH one takes on the old box vs. the new box. That way we can see if it's a generalized issue or one particular job that is misbehaving. 2. Get some DB stats. Download protop (dbappraise.com/protop.html) and use it to see what's going on. ProTop is much more information-dense than promon. Are these read-intensive or write-intensive batch jobs? 3. Triple-check the DB startup parameters. This could very well be an "oops!" moment. Don't forget BI block size and cluster size. 4. Truss the processes and see if they are doing anything interesting at the kernel level. IBM has a post-truss cruncher that chews up the output and spits out a nice report. That's how we saw the UNIX SILENT issue: abnormally high fork()'s . 5. Are you running OpenEdge Replication too? 6. Did you dump and load going from the old box to the new? Or make any changes to the DB like storage area stuff?
Continue reading...
Continue reading...