Disk Utlisation 100% during dump process

anoobbhaskar · Mar 22, 2012

I am doing parallel dump in HP Unix system with open edge 10.1B running. My data base size is around 70 GB and System memory is 8 GB. The database is offline while performing the proutil dump with -RO option.
$DLC/bin/proutil $DUMPDB/ttgres -C dump cabininv $DCABININV -RO

The current issue is Disk IO become 100% when we start executing the Dump process. We are using CX4120 storage with RAID 10 for storing the data.I have checked with the VEndor and they confirmed that there will not be any problem with the device. Memory is showing 55 % and CPU is around 25% during the dump time. I will be able to execute more scripts simultaneously if Disk IO is very less.

Can anyone suggests me how we can reduce the Disk IO?

View attachment 1011

TomBascom · Mar 22, 2012

100% disk utilization is the objective of running a parallel dump. Disk is your slowest resource -- so it is going to be where the bottleneck is. Which is indicated by 100% utilization.

More threads does not necessarily mean that you finish any faster -- there is a point where more threads starts to reduce throughput because they are contending with each other.

anoobbhaskar · Mar 22, 2012

TomBascom said:
100% disk utilization is the objective of running a parallel dump. Disk is your slowest resource -- so it is going to be where the bottleneck is. Which is indicated by 100% utilization.

More threads does not necessarily mean that you finish any faster -- there is a point where more threads starts to reduce throughput because they are contending with each other.

Will it help if I dump to different logical volumes or different disk array?

TomBascom · Mar 22, 2012

Yes. Reading from one device and writing to another will give you a lot more bandwidth. In the simplistic case of 2 disks that are the same it would (more or less) double your throughput. Maybe more since the target disk could be largely sequential access (although the multi-threading will tend to randomize that).

anoobbhaskar · Apr 5, 2012

Thanks a ton

anoobbhaskar · Jun 6, 2012

I have a question.

Are there any ways to find out the factors affecting a dump activity. We have performed a binary dump in two different machines as a background process. Two machines are of same configuration but the dump time was different. I have not seen any memory (Max 72%) or CPU (Max 40%) issue. IO was 100% since beginning. Progress version is 10.1B (RISC) . Both servers are running HP UX 11i. Please let me know if you need more details.

Rob Fitzpatrick · Jun 6, 2012

A lot of things can affect dump time. I am going to assume you are referring to a dump of similar databases on two different systems. By "similar" I mean same schema, structure, and data. But having those items the same between two databases is not a sufficient condition for them to be identical. Nor does similar configuration of your servers ensure they will perform the same. Even a single server may perform differently at different times, based on workload, disk usage, file system health, etc. etc.

The type of structure you are dumping out of (Type I vs. Type II storage areas) makes a difference in dump time. So does the state of your indexes. You have to read the data in order to dump it, which means lots of index reads. If you have low index utilization in one DB vs. another, you will do more physical I/Os to read the same number of keys, which will take more time. Also, the degree to which records are physically scattered on disk can increase seek time, which affects overall dump time.

You didn't mention specifics so I don't know how you dumped. If you are interested in improving your dump time, have you tried running it multi-threaded (proutil dbname -C dump table dir -thread 1)? Are you spreading out your I/O load as much as possible, e.g. writing dump files to a non-DB disk (or disks)? Are you ensuring that there is no other I/O load on your storage at the time of the dump? Are you running multiple dumps concurrently? Are you writing to RAID 5 storage? (You don't want to be.)

If you have a mix of table sizes, e.g. some very large history/transaction tables and some small configuration tables, you might consider a hybrid approach where you dictionary dump the small stuff and binary dump the large tables. Or, alternatively, have a single shell script where you binary dump all your small-to-medium size tables serially and single-threaded, and then others where you dump the big ones in parallel, multi-threaded.

There are a lot of variables you can play with, and a lot depends on what kind of data sizes and hardware you're dealing with. In short: be disciplined, run a lot of tests, ensure they are repeatable (e.g. across days and/or reboots), and document your results.

anoobbhaskar · Jun 6, 2012

Rob Fitzpatrick said:
A lot of things can affect dump time. I am going to assume you are referring to a dump of similar databases on two different systems. By "similar" I mean same schema, structure, and data. But having those items the same between two databases is not a sufficient condition for them to be identical. Nor does similar configuration of your servers ensure they will perform the same. Even a single server may perform differently at different times, based on workload, disk usage, file system health, etc. etc.

The type of structure you are dumping out of (Type I vs. Type II storage areas) makes a difference in dump time. So does the state of your indexes. You have to read the data in order to dump it, which means lots of index reads. If you have low index utilization in one DB vs. another, you will do more physical I/Os to read the same number of keys, which will take more time. Also, the degree to which records are physically scattered on disk can increase seek time, which affects overall dump time.

You didn't mention specifics so I don't know how you dumped. If you are interested in improving your dump time, have you tried running it multi-threaded (proutil dbname -C dump table dir -thread 1)? Are you spreading out your I/O load as much as possible, e.g. writing dump files to a non-DB disk (or disks)? Are you ensuring that there is no other I/O load on your storage at the time of the dump? Are you running multiple dumps concurrently? Are you writing to RAID 5 storage? (You don't want to be.)

If you have a mix of table sizes, e.g. some very large history/transaction tables and some small configuration tables, you might consider a hybrid approach where you dictionary dump the small stuff and binary dump the large tables. Or, alternatively, have a single shell script where you binary dump all your small-to-medium size tables serially and single-threaded, and then others where you dump the big ones in parallel, multi-threaded.

There are a lot of variables you can play with, and a lot depends on what kind of data sizes and hardware you're dealing with. In short: be disciplined, run a lot of tests, ensure they are repeatable (e.g. across days and/or reboots), and document your results.

Here is the command we have tried.
$DLC/bin/proutil $DUMPDB/ttgres -C dump bkstatus $DHOT2 -RO , where DUMPDB and DHOT2 are paths. Both data base are in Type 1 and are similar in tables, data size and index. We have created a copy of the data base in another machine so scatter factor for tables and blocks for the index are same. There are around 4 background scripts running parellel for the dump using shell scripts. In order to follow the policy that we need to load the lower tables first, we have dumped the lower tables first. Usually we will shutdown the DB and make sure all process are dead before the dump starts. Both Servers are connected to SAN with a configuration of RAID 10.

cj_brandt · Jun 7, 2012

I don't understand this statement "In order to follow the policy that we need to load the lower tables first". You are loading this data back into a type II storage area right ?

Look at your disk IO activity - is it high reads or high writes ? If your data is scattered around a type I storage area, your disk IO at 100% may be due to trying to read the database records - not writing data. Also Glance will say our Disk IO is at 100% even when it isn't - if you are getting your stats from Glance.

Have you tested dumping the tables in multiuser mode vs offline with the -RO parameter ?

Have you looked at using a different index to dump the data instead of the primary ? There is a lot of info about reviewing a database analysis and picking the fastest index to use for the dump.

Rob Fitzpatrick · Jun 7, 2012

If you are sophisticated enough to be able to dump and load, there's no reason to stay on Type I storage. It's holding you back in several ways: speed, reliability, ease of maintenance, future-proofing, etc.

anoobbhaskar · Jun 8, 2012

The statement means that loading should be performed with small sized tables first then proceed to larger sized tables.

Currently we are performing our dump when the data base is down. The scatter factor value was not good and some of the tables even has a value of more than 8. I think the disk is 100% is because of disk reads and I am using Glance to gather the resource status.
Some examples of DB analysis of our Type I storage type
PUB.cbdsc 79205 10.6M 83 317 140 86367 1.1 8.2
PUB.cbinv 25733343 3.4G 124 223 141 29168901 1.1 8.0
PUB.adef 24945 5.4M 114 331 225 32990 1.3 8.8

We have also tried using different index for dump. We have identified some tables and have used the index option for those tables .
Is it possible to reduce the disk read rate with out down time? I want to increase the dump performance and reduce the Disk IO rate?
One more question, why these factors are behaving differently in each servers? If I am not mistaken , these factors should be applicable to both servers right?

cj_brandt · Jun 8, 2012

Just load into type II storage areas and don't worry about smallest first - that advice was around when dealing with type I storage areas. You might want to make sure you aren't following other advice that is no longer relevant. Maybe search for other posts that talk about dump and loads.

I would probably create new type II storage areas and then slowly move the tables to the new area. I would move the tables that have the most read activity first. You will see the buffer hits go up after you move the tables to the new storage area - performance will jump up - CPU util will go up - disk IO will go down and you will be a hero.

TomBascom · Jun 8, 2012

I agree. Getting off type 1 areas should be priority #1.

Unless you are also not running after-imaging. In which case you should stop messing around with a dump & load and get AI implemented yesterday.

Anyhow -- your new database should have type 2 areas. Friends don't let friends store data in type 1 areas.

Reducing the disk read rate during a dump should not be your objective. You should be aiming to *increase* it. You want to spin those disks just as fast as they can possibly go to pull the data off of them as fast as possible. That means they should be very, very busy. If the disks aren't doing much it means that your dump is going very slowly.

Disk Utlisation 100% during dump process

anoobbhaskar

New Member

TomBascom

Curmudgeon

anoobbhaskar

New Member

TomBascom

Curmudgeon

anoobbhaskar

New Member

anoobbhaskar

New Member

Rob Fitzpatrick

ProgressTalk.com Sponsor

anoobbhaskar

New Member

cj_brandt

Active Member

Rob Fitzpatrick

ProgressTalk.com Sponsor

anoobbhaskar

New Member

cj_brandt

Active Member

TomBascom

Curmudgeon