Dump Load - Migration to Red Hat Linux

ymjamal · Jun 28, 2012

We are moving from 32-bit Openedge 10.1A running on 32-bit HP-UX 11.11,
to 32-bit Openedge 10.2B running on 32-bit Red Hat Enterprise Linux 6.0.

We are currently running our database spread across several disks using MirrorDisk UX, and will be moving to a RAID 5 configuration in our new environment.

My questions are regarding the DUMP/LOAD:

Current Status:
Application: Character based Forest products Order Entry and Sales application written by PSI (Progressive Solutions) all in progress 4GL,
We have bought the source code and have done extensive customization.

Data Base:
1 Data Base all in schema area.
Size: 40 GB
Tables: 668
Empty Tables: 208 (these table for partes of the purchased software that we do not use).
Tables with over 10,000 Records: 113 occupying 39.5GB
Tables with less than 10,000 records: 555 occupying 500 MB

Need help to choose the best method for dump and load:
1. Should we be concerned about the unused tables? If it will not affect performance we would like to keep them in case we decide to use those areas of the application in the future.

2. Best strategy for dump and load assuming we can have the system unavailable say for 48 and may be 72 hours.

2.1 Custom Dump/Load:
Phase 1: identify the historical records of large tables (i.e. information that can not be changed, for example closed orders, respective shipments and invoices), and custom dump up to a given unique key (say up to order number). And load the historical data to the new DB.
Phase 2: While the system is not available, custom dump small tables and the non-historical data from the large tables (say from order number) and custom load to the new DB.

2.2 Binary Dump/Load

2.3 Character Dump/Load

TIA

jdpjamesp · Jun 28, 2012

I'm not able to advise on the best strategy, but I would definitely advise a Binary dump and load as it should be a lot faster than ASCII.

cj_brandt · Jun 28, 2012

Practice a few times.
Look at multithreaded binary dumps for the large tables.
Look at using indexes other than the primary index for tables that take a long time to dump.
All data in the schema area so the dumps will be very slow - might take 75% of the total time.
As tables finish dumping from source machine - send over to target and begin loading data.
Remember to build indexes are part of binary load or separately with idxbuild at the end.

Make sure you load data in new Type II storage areas.

Rob Fitzpatrick · Jun 28, 2012

jamaly said:
We are moving from 32-bit Openedge 10.1A running on 32-bit HP-UX 11.11,
to 32-bit Openedge 10.2B running on 32-bit Red Hat Enterprise Linux 6.0.

Why, in the year 2012, would you use a 32-bit OS? Especially on a database server with a 40 GB database? RAM is relatively cheap, and if you have a lot of you can allocate a large buffer pool, provided that the OS and OpenEdge can address it, i.e. they are 64-bit. I have a client with a DB larger than that and the entire thing fits in -B. This gets you very good cache hit percentage, and you avoid unnecessary I/O. Which brings us to the second point...

We are currently running our database spread across several disks using MirrorDisk UX, and will be moving to a RAID 5 configuration in our new environment.

I don't know what MirrorDisk UX is, but I do know what RAID 5 is, and I know you want to stay the hell away from it on a database server (unless you're not planning on doing any writes). Trust me, there will be weeping and gnashing of teeth.

My questions are regarding the DUMP/LOAD:

Current Status:
Application: Character based Forest products Order Entry and Sales application written by PSI (Progressive Solutions) all in progress 4GL,
We have bought the source code and have done extensive customization.

Data Base:
1 Data Base all in schema area.
Size: 40 GB
Tables: 668
Empty Tables: 208 (these table for partes of the purchased software that we do not use).
Tables with over 10,000 Records: 113 occupying 39.5GB
Tables with less than 10,000 records: 555 occupying 500 MB

Before you start dumping and loading, you need to decide what you're loading into. Right now you basically have a version 8-era structure. You need a proper storage area design. You need to get all of your storage objects out of the schema area and into Type II storage areas with carefully-considered RPB and cluster size settings. As this will be a new database you also have the opportunity to consider your DB block size, in case that may need to be changed. You should also think about BI file settings and after-imaging.

Read Tom Bascom's Storage Optimization Strategies for some ideas on how to lay out your DB structure. Good structure design also requires some knowledge of the application, and some analysis of both static data (e.g. dbanalys output) and runtime data (e.g. table and index CRUD stats). These can be obtained from the _TableStat and _IndexStat VSTs, assuming you (a) have set your -tablerangesize and -indexrangesize startup parameters appropriately and (b) you can write the code you need to get at the data.

Need help to choose the best method for dump and load:

1. Should we be concerned about the unused tables? If it will not affect performance we would like to keep them in case we decide to use those areas of the application in the future.

2. Best strategy for dump and load assuming we can have the system unavailable say for 48 and may be 72 hours.

2.1 Custom Dump/Load:
Phase 1: identify the historical records of large tables (i.e. information that can not be changed, for example closed orders, respective shipments and invoices), and custom dump up to a given unique key (say up to order number). And load the historical data to the new DB.
Phase 2: While the system is not available, custom dump small tables and the non-historical data from the large tables (say from order number) and custom load to the new DB.

2.2 Binary Dump/Load

2.3 Character Dump/Load

TIA

Purging and/or archiving data, if allowed by the business, is worthwhile to do. You have to do it eventually, and the less data you have the faster your D&L will be. That said, you aren't faced with a monumental task. With a 40 GB database, you can dump and load several times over in the span of 48 hours, no matter how you do it. You don't have to get into really exotic methods.

Run a proutil dbanalys report to get a sense of what you're faced with. I like to bring it into Excel so I can sort the data. You will probably find that a handful of tables make up the majority of your data. When faced with that scenario I typically do a binary dump/load/index rebuild with those large tables, and then dictionary dump/load the rest. I run the loads with the DB online, so I can use helper processes (BIW/APWs) and promon, with a maxed-out BI, and in no-integrity. Then shut down after the loads, do a backup, and rebuild the remaining indexes for the tables that were binary loaded. Depending on the speed of your server and disks, I wouldn't expect this to take more than a few hours.

If you are concerned about the length of your D&L, you could also consider a preliminary step of taking a smaller downtime window first to compact or rebuild the indexes in your source database. That could help the speed and reliability of the dump and load.

SSuhaib · Jul 2, 2012

A bit of advise on RAID 5. NEVER "EVER" MIGRATE A DATABASE TO RAID 5 OR RAID 6. Few months back, I tried to do migrate a V9.1e DB on redhat to V10.2b on Redhat with RAID 5. It took entire week just to binary dump data of around 50 GB. Unless it is read only, datawarehouse or a reporting server, insist on Raid 1+0.

On D/L, I prefer a highly parallel binary D/L to migrate to the new environment.

Since you plan to migrate to latest version of both Openedge & redhat, why stick to 32 bit version? Prefer 64-bit, if you can, as it has many performance advantage over 32-bit.

Regards,
Ssuhaib.

ymjamal · Jul 20, 2012

Thanks for your input. It has been a great help for me to decide how to proceed.
I'm now in the process of creating and customizing binary dump scripts for test on a QA database.
For large files, based on may knowledge of the software and undelying database, I intent to use dumpspecified.
It seems that I can only refer to a field if that field is an unique component of an index or it is the first element of a compound index.
This allowed usage in same of large table. Other tables have compound indexes only, specially the orders, shipments and invoices tables. The index is preceeded a field defining the "entity". The software was designed for multiple entity database.
For example on the ord_hdr table we have an index named ord_hdr26 made up of entity and ord-num.
I do not seem to be able to use this index ord_hdr26 or ord-num in the proutil command. Am I doing anything wrong or that is the limitation?
TIA

TomBascom · Jul 21, 2012

Dump specified is one of those things that looks good on paper but which actually isn't all that helpful most of the time.

Dump Load - Migration to Red Hat Linux

ymjamal

New Member

jdpjamesp

ProgressTalk.com Moderator

cj_brandt

Active Member

Rob Fitzpatrick

ProgressTalk.com Sponsor

SSuhaib

Member

ymjamal

New Member

TomBascom

Curmudgeon