Question Backups taking a long time on a server that's been p2v'd

#1
Server: Windows 2012
Progress: 10.2B08
DB Size: 180GB

We just took our physical live server and did a p2v to move it to virtual. We already had moved our Roundtable server and had no issues. The one thing I forgot to test was the backups. We do have Replication but we run backups for troubleshooting and development.

We went live and everything was working well - performance was better and our application was smoother. Then we realized our offsite copy was failing. The backup times went from around 40 minutes to 3.5 hours. I expected some slowness but this one server is extraordinarily different.

We have another p2v server and it doubled to 80 minutes... All other VM's take around 80 minutes per online backup.

Just this one machine takes 3+ hours. We even copied that server and made all kinds of changes, trying to fix it and even the copy exhibits the same issues.

We have taken the AV off from it... all unnecessary software... Thick provisioned... changed the network adapter... given it more resources... given it less resources... Nothing has corrected the issue and Progress Support has not been helpful.

Before I copy one of our Replicated servers and prep it to replace the p2v'd server - does anyone have any related experience with this?

Thanks
 
#2
Was there a change in the underlying storage? i.e. local disk to SAN?
Have you changed the backup start time?

The more experienced on this forum normally suggest using probkup as a good indication of storage throughput.

Final thought are you snapshotting the VM on a schedule? Doing that too regularly means you can get stuck in a catch up loop. If you are turn the snapshot off and try again.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
#3
We do have Replication but we run backups for troubleshooting and development.
And for disaster recovery, right? ;)

Just this one machine takes 3+ hours. We even copied that server and made all kinds of changes, trying to fix it and even the copy exhibits the same issues.

We have taken the AV off from it... all unnecessary software... Thick provisioned... changed the network adapter... given it more resources... given it less resources... Nothing has corrected the issue and Progress Support has not been helpful.
  • Is probkup writing the file locally?
  • Is it writing to the same disk(s) where the database resides?
  • What can you tell us about the underlying storage of the physical and virtual servers?
  • Does the pre-p2v physical server still exist in its original state?

First I'd try to take OpenEdge out of the equation and see if the problem persists. Can you test write throughput (e.g. CrystalDiskMark or something similar) on the slow server and one of the faster servers and compare the results?

If they are more or less the same, that would point to an OE issue. If the throughput test is much slower on the machine with the 3+ hour backup, then that points to an environment issue: storage, hypervisor, physical host, VM configuration, something like that.
 
#4
And for disaster recovery, right? ;)
Of course lol Replication seems nice but you never know when you need to roll back. And we're still archiving AI files as well ;)

[LIST said:
[*]Is probkup writing the file locally?
[*]Is it writing to the same disk(s) where the database resides?
[*]What can you tell us about the underlying storage of the physical and virtual servers?
[*]Does the pre-p2v physical server still exist in its original state?
[/LIST]

First I'd try to take OpenEdge out of the equation and see if the problem persists. Can you test write throughput (e.g. CrystalDiskMark or something similar) on the slow server and one of the faster servers and compare the results?

If they are more or less the same, that would point to an OE issue. If the throughput test is much slower on the machine with the 3+ hour backup, then that points to an environment issue: storage, hypervisor, physical host, VM configuration, something like that.
We are writing locally and to the same disk the DB is on (this worked fine on the physical server we copied). I am sure it causes issues but test servers are not having these issues with the same setup.
Disk throughput is good and it matches the test servers we have up on the same VM (ESXi 6.0) that do not have the drastic slowdown. We've done file copies and a whole slew of other tests... they are all running together as I am in my second week of trying to find the issue.

We have the old server still and it would take some work to get back up and isolated on the network but the C:\ was full and preventing any updates from happening and causing grief trying to keep enough space to keep the server functional.
 
#5
Was there a change in the underlying storage? i.e. local disk to SAN?
Have you changed the backup start time?

The more experienced on this forum normally suggest using probkup as a good indication of storage throughput.

Final thought are you snapshotting the VM on a schedule? Doing that too regularly means you can get stuck in a catch up loop. If you are turn the snapshot off and try again.
That's the funny thing... moving files are speedier than on the physical server. And - we are not doing any snapshots... even if we were - it's only effecting this one p2v. Our other p2v is slow but still a 1/3rd of the time as this server and has less resources.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
#6
Are all servers at the same levels of Windows patching? E.g. for Spectre and Meltdown. It's a long shot, but I've seen stranger things.

CPU could be a point of contention for probkup. How do the CPU architectures and core speeds compare? Could there be power-management features on the slow server that are parking cores or throttling core speeds?
 

RealHeavyDude

Well-Known Member
#8
Do you have a different transaction loads on the different servers during the backup time window. In our case (OE 11.6, Sun Solaris, ZFS file systems residing on an EMC SAN) we found out that the transaction load (batch processing) is a big difference maker. The time to take a full online backup of a 500GB database varies between 4 and 8 hours. When we have a look at the size of the archived after image we can see that there is a direct relation between transaction load and the time it takes to take the backup.
 

Cringer

ProgressTalk.com Moderator
Staff member
#9
Do you delete the old backup files before taking the new backup? There's a bug in Windows whereby if the file already exists it takes longer to do the backup than if the file isn't there. And it gets exponentially worse the larger the backup file.
 
#10
Did something happen that has made the BI file comparatively large on this server?
The BI file is large on this server but the time to back it up is small and the BI size doesn't seem to bother the other backups on similar VM's in the same way. I forgot to truncate it when we were switching and haven't been able to take it down to eliminate that possibility. I did truncate it on the copy of this VM and it is pretty much the same.
 
#11
Do you have a different transaction loads on the different servers during the backup time window. In our case (OE 11.6, Sun Solaris, ZFS file systems residing on an EMC SAN) we found out that the transaction load (batch processing) is a big difference maker. The time to take a full online backup of a 500GB database varies between 4 and 8 hours. When we have a look at the size of the archived after image we can see that there is a direct relation between transaction load and the time it takes to take the backup.
Considering I have taken backups on multiple test VM's on the same server all day while the system is and is not under load - and the results have been similar - I think in our case, load has contributed to a +/- 20 minutes on the problem server and more +/- 5 minutes on the other servers. Just this one server and any copy of it exhibits this slowdown.
 
#12
Do you delete the old backup files before taking the new backup? There's a bug in Windows whereby if the file already exists it takes longer to do the backup than if the file isn't there. And it gets exponentially worse the larger the backup file.
I have run it as many ways as I can try. Most of our backups are through OE Management scheduler so I usually run that but have also run it from the command line a lot with or without the old BU being there.
 
#13
Are all servers at the same levels of Windows patching? E.g. for Spectre and Meltdown. It's a long shot, but I've seen stranger things.

CPU could be a point of contention for probkup. How do the CPU architectures and core speeds compare? Could there be power-management features on the slow server that are parking cores or throttling core speeds?
We are fully patched up... we moved to VM's so we could be. The old physical server had a full C:\ and we just couldn't keep enough space on it. VM allows us to expand that as necessary.

We have a ton of resources available for these servers... we have increased... decreased... changed every setting we can without seeing changes.
 
#14
Progress doesn't have anything to add... they point to VMWare in all of their issues and give me suggestions that do not explain why just this server does it. All the VMWare issues they suggest would affect all the VM's...

Their last suggestion is what we've been talking about anyways - which is to go ahead and build a fresh VM install and go from there. Nothing seems to make sense but nothing I have tried seems to fix it.
 
Top