I have an odd problem that I hope someone else has possibly encountered, though I have not found anything like this on this or any other forum.
*******
Progress v10.2a, SP01,HF02 (Enterprise )
3 servers involved are all Window 2003, updated, and physical
Server 1 is DB server, runs the backup. Clustered with server 2, data and backups are on an attached SAN.
Server 2 runs the robot process program
Server 3 holds the hot folders that XML files are dropped to.
*******
We have multiple production plants, and are running 11 Progress DB’s. 3 of these locations receive a large number of XML files daily during the 4th quarter from a customer. I have robot processes (Progress program) that look for these XML files and create the orders in the ERP systems.
All day long this process just hums along, processing files as quickly as they come in for the most part, sometimes up to 15 files a minute into each of the 3 ERP’s.
I have a backup script that starts at 6pm and progresses through all 11 DB’s one at a time (excerpt here):
********************
call probkup online e:\hsi\data\pc\gams1.db e:\Backups\oa\pc.bak >> e:\Backups\oa\Pclog.txt
call probkup online e:\hsi\data\nave\gams1.db e:\Backups\oa\nave.bak >> e:\Backups\oa\navelog.txt
call probkup online e:\hsi\data\he\gams1.db e:\Backups\oa\he.bak >> e:\Backups\oa\Helog.txt
etc…
*********************
Here’s the problem: A couple minutes after the BU script starts, 1 of my DB’s come to a grinding halt– the DB stays up and those XML files will process, but instead of 15/minute it will sometimes take 20-30 minutes to process the same files as was being processed all day, and right up until that time.
The DB in question is the 2nd DB that is in the script to backup, yet is affected as soon as the 1st DB BU starts, and continues to be affected until the last DB BU has competed. At the end of the BU script all Progress BU files are copied off to another server (using RoboCopy) where they are then backed up to tape , but like I say, this DB in question returns to normal processing once that last DB probkup completes – the RoboCopy process does not effect it.
The ERP application itself is effected during this time as well, but does not seem to be as effected, it is mostly operable, though noticeably slow.
As I mentioned, there are 3 locations (3 separate DBs) that run this exact same process, on the same servers, of receiving and processing these same xml files, from this same customer – it is only this DB that is effected…the other 2 continue, during the whole BU process, to process those XML files at the standard 15/minute clip.
***********
A couple things I’ve tried or monitored:
*To prove my theory that it was the BU starting that was causing the problem, I moved the start from 6pm, to 630pm, and then again to 8pm. Each time this slowdown started 1-4 minutes after the BU script started.
*When this issue occurs I have checked that there is no unusual/excessive server activity happening - CPU and memory usage are low.
*ProMon – overall buffer hits are good, at least in the 95-98% range, but I can see that when this issue is happening, there seems to be little going on overall (when there should be a flurry of reads and writes to the DB).
*Tried commenting out the effected DB from the BU script, no effect (didn’t really expect this to help, but tried anyway)
*The DBs and the backup files created do sit on the same SAN, thinking that maybe there could be some conflict there, I changed the BU to create the files in a different location (not on the SAN) last night, still the same results.
*Neither the backup log files nor the DB log files reveal any issues.
***********
Appreciate any thoughts. The big problem I will have at this point it testing anything. This customer is winding down for the season and will not be active until 4th quarter next year…will have to probably come up with some creative test plan.
Thanks. Happy Holiday’s!
*******
Progress v10.2a, SP01,HF02 (Enterprise )
3 servers involved are all Window 2003, updated, and physical
Server 1 is DB server, runs the backup. Clustered with server 2, data and backups are on an attached SAN.
Server 2 runs the robot process program
Server 3 holds the hot folders that XML files are dropped to.
*******
We have multiple production plants, and are running 11 Progress DB’s. 3 of these locations receive a large number of XML files daily during the 4th quarter from a customer. I have robot processes (Progress program) that look for these XML files and create the orders in the ERP systems.
All day long this process just hums along, processing files as quickly as they come in for the most part, sometimes up to 15 files a minute into each of the 3 ERP’s.
I have a backup script that starts at 6pm and progresses through all 11 DB’s one at a time (excerpt here):
********************
call probkup online e:\hsi\data\pc\gams1.db e:\Backups\oa\pc.bak >> e:\Backups\oa\Pclog.txt
call probkup online e:\hsi\data\nave\gams1.db e:\Backups\oa\nave.bak >> e:\Backups\oa\navelog.txt
call probkup online e:\hsi\data\he\gams1.db e:\Backups\oa\he.bak >> e:\Backups\oa\Helog.txt
etc…
*********************
Here’s the problem: A couple minutes after the BU script starts, 1 of my DB’s come to a grinding halt– the DB stays up and those XML files will process, but instead of 15/minute it will sometimes take 20-30 minutes to process the same files as was being processed all day, and right up until that time.
The DB in question is the 2nd DB that is in the script to backup, yet is affected as soon as the 1st DB BU starts, and continues to be affected until the last DB BU has competed. At the end of the BU script all Progress BU files are copied off to another server (using RoboCopy) where they are then backed up to tape , but like I say, this DB in question returns to normal processing once that last DB probkup completes – the RoboCopy process does not effect it.
The ERP application itself is effected during this time as well, but does not seem to be as effected, it is mostly operable, though noticeably slow.
As I mentioned, there are 3 locations (3 separate DBs) that run this exact same process, on the same servers, of receiving and processing these same xml files, from this same customer – it is only this DB that is effected…the other 2 continue, during the whole BU process, to process those XML files at the standard 15/minute clip.
***********
A couple things I’ve tried or monitored:
*To prove my theory that it was the BU starting that was causing the problem, I moved the start from 6pm, to 630pm, and then again to 8pm. Each time this slowdown started 1-4 minutes after the BU script started.
*When this issue occurs I have checked that there is no unusual/excessive server activity happening - CPU and memory usage are low.
*ProMon – overall buffer hits are good, at least in the 95-98% range, but I can see that when this issue is happening, there seems to be little going on overall (when there should be a flurry of reads and writes to the DB).
*Tried commenting out the effected DB from the BU script, no effect (didn’t really expect this to help, but tried anyway)
*The DBs and the backup files created do sit on the same SAN, thinking that maybe there could be some conflict there, I changed the BU to create the files in a different location (not on the SAN) last night, still the same results.
*Neither the backup log files nor the DB log files reveal any issues.
***********
Appreciate any thoughts. The big problem I will have at this point it testing anything. This customer is winding down for the season and will not be active until 4th quarter next year…will have to probably come up with some creative test plan.
Thanks. Happy Holiday’s!