9.1D08 Slow Ascii Load

FocusIT

Member
Hi All

I was wondering if anyone has experience of dumping and loading a single table with over 500Million records. I am working on a 9.1D08 database running on W2003 Standard R2 which has severe database corruption. A Proutil tabanlays throws up repeated 5433 errors due to a table being dropped before its data was deleted. In fact the file holding these errors grew to 7GB before I killed the Proutil session. Therefore, I feel the the only course of action is an Ascii dump and load through Data Administration. A binary dump and reload would only carry the corruption over to the new database.

I have dumped the table containing the 500 Million plus records into 6 separate files, one for each year by timestamp. The first two files containing 30 Million records between them loaded relatively fast averaging 2k records per second, but the third containing 50 Million records is loading very slowly at < 200 records a second. I have also tried loading the data as one file and it also slowed down when it got to about 30 Million records. On top of this I have tried loading single and multiple files with the database in single and multiuser modes. Again, the load slowed down at about 30 Million records. The only thing left to try is enabling direct i/o.

Does anyone out there have a pearl of wisdom I could try?

The table has its own data and index area in the database.
 

TomBascom

Curmudgeon
9.1d08 is of course ancient, obsolete and unsupported. Dumping and loading would be an ideal excuse to upgrade...

A binary dump & load is no more likely to propagate this sort of corruption than an ascii d&l. The "binary" portion just refers to the format of a record not the selection of those records (which is the problem with your 5433 error). A binary d&l just skips the conversion of integer fields to ascii strings and such. It doesn't impact which records get dumped.

Anyhow, does the data perhaps change in some way at around the 30 million record mark? Perhaps the rows are much larger? Or the keys denser (or less dense)?

Or perhaps it is just that right around 30 million rows you saturate your cache and you are now experiencing disk contention. (This would be my #1 candidate.)

Generally speaking loading with multiple threads and with a server running along with a BIW, one or more APWs, a large bi cluster size, pre-grown bi clusters, a moderate -B, -spin at a modest value like 10,000 and with -i (no-integrity) is going to give the best performance. But everyone's situation is different and your specific HW configuration is unknown.

Are indexes enabled or disabled for the load? There are situations where either choice could be better.

Monitoring the system's resources to identify the bottleneck that appears at the point of slowdown would help. It is probably disk oriented but it would be helpful to know which disk (assuming there is more than one and that you have arranged things so that different elements of the database are on different disks...)

Depending on what happened and when it happened and what has happened since you might also want to consider restoring from backup and rolling forward your ai files to just prior to the unfortunate maintenance accident.
 

FocusIT

Member
Hi Tom, thanks for your reply.

I agree, 9.1D08 is ancient but the app is locked into this version by the supplier who are resisting an upgrade.

The hardware platform is reasonably specced as it is a recently retired live Data Server. The extent files for this data area sit on 16 physical drives (15K SAS) in RAID 10 and there are 8 processor cores and 4GB of memory. I too thought the disks were the limiting factor so I loaded around 30 Million rows from one file, bounced the server and truncated the BI before loading the next file. The next file was immediately slow and not after loading the first 30 Million. One theory is that Progress is scanning from the first file forward to find the first empty slot for each record, hence after 30 Million records it is now having to scan to the 8th 2GB extent to find an empty block. The load rate also seems to cycle from around 1k per second then slowly drops over a minute to < 100 per second, then increase again. Overall average is 200 per second. The disk idle time drops with load rate then improves when the load rate increases.

I think there is something in your theory about the record size changing around the 30 Million mark. The records are being loaded in date order so once this next file has loaded it will be up to 2007 so I can check the average record size against 2006 an cross check it to any schema updates.

The less said about the maintenance accident the better, but rolling back is not an option as it was caused over a month ago by a previous dump and load attempt (not by me) and the DB is not running with AI enabled.
 

TomBascom

Curmudgeon
f you've paid any attention at all you know what I think of vendors who resist upgrades...

There is, of course, no technical reason why you cannot upgrade to at least 9.1e04. If you're dumping and loading I'm thinking that you are also probably re-compiling which would mean that there is absolutely nothing stopping you from going to 10.2.

But anyhow... A change in the nature of the data could be it. I think you said that the files are split up by date range? If so then are they all roughly the same size for similar date ranges? (They should be if business volume has been consistent.)

It might also be that you've reached a tipping point in the index structure (if they are enabled for loading).

If indexes are enabled for loading you might try disabling them. Likewise if they are disabled you might try enabling them.

What do your -T files look like? Are any of them out of control large?

I mentioned a number of tuning principles for d&l. You haven't indicated if you are following any of them.

The idea regarding scanning from the 1st file forward is not correct. That is not how it works.

Are you sure that it is RAID10? It's worth double checking -- it's one of those things that, for one reason or another, often turns out to be different than it initially appears to be.

You are, of course, planning to fix the "no ai" problem. Right?
 

FocusIT

Member
I am new to Progress Talk so cant say I have read your other threads, but I can imagine. If the vendour doesnt upgrade soon then Microsoft are going to decide for them by retiring 32 bit Server 2003 and or MMC2.

Tuning params are: -

-B 100,000
DB Blocksize 4096
Spinlocks 50,000
-i enabled
Direct i/o disabled
4 * APW
BIW enabled

BI Blocksize 4096
BI Clustersize 16384
BI made up of 3 1GB extents and a variable

-T files all under 1mb during load

Logical drive is definately stripped and mirrored, the server was built in house. The AI issue will be addressed once the database is in a decent state.

I will try disabling the indexes for the next file. This will add at least 10 hours to the D&L to rebuild them, but hopefully that will be saved in the load time. Each file is much larger than the last as the systems transactional load has grown year on year. I will do a tabanalys between each file to see if the average record size is growing.

If all else fails I will try a binary D&L for that particualr data area.
 

TomBascom

Curmudgeon
Large -B values can be counter-productive during a load operation. You have to experiment to see though. They are definitiely counter-productive for an index rebuild. Keep -B under 25k if doing an index rebuild.

-spin is almost certainly way too high. Try 10000. You can change it while the db is live through PROMON.

BI blocksize is small but it likely doesn't matter all that much. BI cluster size is also small and might matter a lot more. I usually use 65636 when loading. And bigrow a bunch of clusters first. It helps.

You might want to poke around with a tool such as ProTop (you should probably download the character client) to help find bottlenecks.
 
Top