Backup Is Very Slow.

Jack@dba · Oct 30, 2017

Hi,

I created test databases as per application team request.Before releasing these databases i have taken offline backup .But for larger databases backup is running for very long.

Can i know reason why backup is running for long.

for Ex:
Database size is 65GB.

To take offline backup it is taken 5 Hours.
Our version is old 9.1e

We checked with OS team server side everything is fine
Server memory 3 GB used out of 3.5 GB and paging space is used 21% out of 4.6GB

Db.pf
======================
-B 65000 # Number of (8K) Blocks in database buffer
-bibufs 200 # Number of before-image buffers
-aibufs 300 # Number of after-image buffers
-Mi 1 # Min processes on a client server
-Ma 15 # Max number of REMOTE clients per db server
-Mn 30 # Max number of REMOTE client servers
-Mxs 32768 # Shared memory overflow size (override)
-spin 15000 # Number of spin lock retries
-L 56000 # Lock Table entries

db.lg

09:07:33 BROKER 0: Multi-user session begin. (333)
09:07:33 BROKER 0: Begin Physical Redo Phase at 0 . (5326)
09:07:46 BROKER 0: Physical Redo Phase Completed at blk 833 off 2628 upd 0. (7161)
09:07:46 BROKER 0: Started for tstslc using tcp, pid 198472. (5644)
09:07:46 WDOG 31: Started. (2518)
09:07:47 BROKER 0: Progress OpenEdge Release 9.1E on AIX. (4234)
09:07:47 BROKER 0: Server started by pgresdba on batch. (4281)
09:07:47 BROKER 0: Started using pid: 1864. (6574)
09:07:47 BROKER 0: Physical Database Name (-db): /dbadmin/tst/db. (4235)
09:07:47 BROKER 0: Database Type (-dt): PROGRESS. (4236)
09:07:47 BROKER 0: Force Access (-F): Not Enabled. (4237)
09:07:47 BROKER 0: Direct I/O (-directio): Not Enabled. (4238)
09:07:47 BROKER 0: Number of Database Buffers (-B): 55000. (4239)
09:07:47 BROKER 0: Maximum private buffers per user (-Bpmax): 64. (9422)
09:07:47 BROKER 0: Excess Shared Memory Size (-Mxs): 33554432. (4240)
09:07:47 BROKER 0: The shared memory segment is not locked in memory. (10014)
09:07:47 BROKER 0: Current Size of Lock Table (-L): 56000. (4241)
09:07:47 BROKER 0: Hash Table Entries (-hash): 18289. (4242)
09:07:47 BROKER 0: Current Spin Lock Tries (-spin): 15000. (4243)
09:07:47 BROKER 0: Number of Semaphore Sets (-semsets): 1. (6526)
09:07:47 BROKER 0: Crash Recovery (-i): Enabled. (4244)
09:07:47 BROKER 0: Database Blocksize (-blocksize): 8192. (6573)
09:07:47 BROKER 0: Delay of Before-Image Flush (-Mf): 3. (4245)
09:07:47 BROKER 0: Before-Image File I/O (-r -R): Reliable. (4247)
09:07:47 BROKER 0: Before-Image Truncate Interval (-G): 60. (4249)
09:07:47 BROKER 0: Before-Image Cluster Size: 4194304. (4250)
09:07:47 BROKER 0: Before-Image Block Size: 8192. (4251)
09:07:47 BROKER 0: Number of Before-Image Buffers (-bibufs): 200. (4252)
09:07:47 BROKER 0: BI File Threshold size (-bithold): 0.0 Bytes. (9238)
09:07:47 BROKER 0: BI File Threshold Stall (-bistall): Disabled. (6552)
09:07:47 BROKER 0: After-Image Stall (-aistall): Not Enabled. (4254)
09:07:47 BROKER 0: Number of After-Image Buffers (-aibufs): 300. (4256)
09:07:47 BROKER 0: Storage object cache size (-omsize): 1024 (8527)
09:07:47 BROKER 0: Maximum Number of Clients Per Server (-Ma): 15. (4257)
09:07:47 BROKER 0: Maximum Number of Servers (-Mn): 31. (4258)
09:07:47 BROKER 0: Minimum Clients Per Server (-Mi): 5. (4259)
09:07:47 BROKER 0: Maximum Number of Users (-n): 76. (4260)
09:07:47 BROKER 0: Host Name (-H): TEST. (4261)
09:07:47 BROKER 0: Service Name (-S): dbtest. (4262)
09:07:47 BROKER 0: Network Type (-N): tcp. (4263)
09:07:47 BROKER 0: Character Set (-cpinternal): iso8859-1. (4264)
09:07:47 BROKER 0: Parameter File: /dbadmin/tst/db.pf. (4282)
09:07:47 BROKER 0: Minimum Port for Auto Servers (-minport): 18001. (5648)
09:07:47 BROKER 0: Maximum Port for Auto Servers (-maxport): 23000. (5649)
09:07:47 BROKER 0: This broker supports both 4GL and SQL server groups. (8865)

Rob Fitzpatrick · Oct 30, 2017

Jack@dba said:
We checked with OS team server side everything is fine

How convenient.

Jack@dba said:
But for larger databases backup is running for very long.

Is this new? Were backups of this size previously fast and now slow, or have they always been slow?
What do you know about the type and configuration of the storage hardware? Do you know what kind of read and write throughput it can achieve outside of Progress? The OS team should be able to provide statistics on this. If they can't then I wouldn't believe that "everything is fine".

For a point of reference, I'm looking at a client's DB log; they backed up a 35 GB database (online) in 14 minutes (running on v11.3.3) So five hours for 65 GB is very very slow.

Jack@dba · Oct 30, 2017

Thanks Rob...

Is this new? Were backups of this size previously fast and now slow, or have they always been slow?
No Production box it is taking only 15 mins to take backup but in test server it is taking so much time.

What do you know about the type and configuration of the storage hardware? Do you know what kind of read and write throughput it can achieve outside of Progress? The OS team should be able to provide statistics on this. If they can't then I wouldn't believe that "everything is fine".

OS team already provided during backup runs cpu utilization is normal below 70% and physical memory is 20% out of 4.5 GB.
During non-business hours i tried this backup nobody connected to the server

Rob Fitzpatrick · Oct 30, 2017

So a more complete statement of the problem is: "offline production database backup takes 15 minutes; offline test database backup, same size, takes five hours (20 times longer)".

So the obvious questions are:

How does the production storage hardware differ from test in terms of specifications and capability?
How does the production storage hardware differ from test in terms of workload?

At least one of these must be very different between production and test.

You have provided server information, about memory utilization and CPU utilization during a backup. That's fine but I was looking for details about storage:

Where do the databases reside? Direct-attached disks inside the server enclosure? SAN? NAS?
If the array is remote, how is it connected to the server? What is the speed, throughput, and latency of the connection(s)?
What kind of disks are they? HDD? SSD? What are the specs?
What is the configuration of the array(s)? RAID 1? RAID 10? RAID 5? RAID 6? JBOD? Other?
Where is the backup output destination, relative to the database? Different disks? Same disks? Local or remote?
Apart from your backup, what else is happening on those disks/controllers during that backup that would impact its performance?

Jack@dba said:
During non-business hours i tried this backup nobody connected to the server

Which backup specifically? The fast one on production or the slow one on test?

ForEachInvoiceDelete · Oct 31, 2017

Our primary database is currently around 300 GB and doesn't take that long.

Not using mounts and backing up cross network are you?

TomBascom · Oct 31, 2017

He’s running v9. If the storage is from the same era then 5 hours isn’t that bad.... chiseling bits on clay tablets and baking them takes time.

JamesBowen · Oct 31, 2017

I know this might not be of any value but I'll chuck in my two cents anyway. 12 years ago I had a similar problem with the backups taking a lot longer than normal and the server admin guys also reported that everything was "Okay, no problems on their side".
Well after a few days of headbanging I went to the server room and look at the server myself and found that one of the hot-swappable drives was not sitting quite flush in the chassis. Amazingly this was not showing up as an error or warning in the HP Proliant monitoring software. Nobody (except me) actually looked at the physical server to inspect it. The server admin guys re-seated the drive and performance were back to normal.

Rob Fitzpatrick · Nov 1, 2017

Thanks Cecil. On that note, a few years ago a client had a problem where performance on their UAT server suddenly severely degraded. Of course we looked at the application and databases (guilty until proven innocent...) but it turned out the drive controller's battery was dying and had stopped recharging. This battery protects the cache contents in a power failure. When this happens, the controller turns off caching. Again, the server admin didn't notice at first. When he did, and replaced the battery and rebooted, performance went back to normal. So yes, performance problems can stem from lots of different causes.

jdpjamesp · Nov 1, 2017

What OS is this? If it's Windows then make sure you delete the old backup file before creating the new one as this speeds it up immensely.

Rob Fitzpatrick · Nov 1, 2017

Cringer said:
What OS is this?

It's AIX.

jdpjamesp · Nov 1, 2017

Rob Fitzpatrick said:
It's AIX.

Thanks. That rules out my little 'feature' then.

Jack@dba · Nov 1, 2017

Thanks ....

For us OS version AIX 5.3 support has been ended with IBM.

On 1st day 83GB it taken 5:30 hours
On yesterday for 83GB is taken 2 hours
Today for 83GB it is taken only 1:30 hours.

As cecil said may be we have issue with storage drives may be.

TheMadDBA · Nov 1, 2017

Look into the iostat command on AIX. There are options to track disk IO and total throughput on the host.

You can run a sample during the fast backup and during the slow backup and the results should be pretty clear.

There are also a host of vmo and ioo options that control how memory is used and how IO is performed. Run a vmo -a and an ioo -a on both hosts and compare the output.

System admins always focus on CPU and memory for some reason and never the Disk IO or network IO. Make them care about both of them.

Backup Is Very Slow.

Jack@dba

Member

Rob Fitzpatrick

ProgressTalk.com Sponsor

Jack@dba

Member

Rob Fitzpatrick

ProgressTalk.com Sponsor

ForEachInvoiceDelete

Active Member

TomBascom

Curmudgeon

JamesBowen

19+ years progress programming and still learning.

Rob Fitzpatrick

ProgressTalk.com Sponsor

jdpjamesp

ProgressTalk.com Moderator

Rob Fitzpatrick

ProgressTalk.com Sponsor

jdpjamesp

ProgressTalk.com Moderator

Jack@dba

Member

TheMadDBA

Active Member