Data Extraction performance

Drammy

Member
Hi all,

Our developers at work have developed a routine that extracts all data from all tables to text files. The routine does some data manipulation such as replacing the Progress null char (?) with blank string.

In my initial tests I was finding output speeds upwards from 7,000 rows per sec.

Since then the app has been installed on some of our clients and we have been experiencing poor performance (in the region of 500 rows per sec).

Can anyone recommend any database admin that we can perform to attempt to improve performance?


Thanks,
Martyn
 
Whats the system load when you are running the query to dump the records.
Some Points:
run the code in Background.
If possible do a multithreaded dump.
Also it would be better to have the entire data in temp-table and then read the temp-table to write to disk from temp-table(io issues can be minimised).

If you can throw some light on the code used it will be better to provide a more menaingful answer.

Arshad
 
Thanks Arshad,

The tool has been designed as part of a batch process and to be run overnight with minimal system load.

I'll raise your points with our developers however I can't really provide code as I don't have access to the development team's SourceSafe.


One of our clients has just shown us some Fragmentation stats and most tables are over 5 for scatter factor. With one table up to 6.8!

It looks like we have to perform a data dump and load to defragment the data.


Are there any other similar DBA tasks that I could investigate that would affect performance?
 
It all depends...

How are your clients configured? Has any db tuning or configuration ever been done? Has anyone in your company ever taken a Progress DBA course or worked closely with an experienced Progress DBA? What version of Progress are we discussing? What OS does this run on? What hardware resources are available?
 
OpenEdge 10.1B
We are a Progress software house, I don't know if anyone in our company has taken a Progress DBA course, most of the technical support team seem to know what they're doing.
We have 1 client reporting this issue on Linux Redhat and another on Windows 2003.
On the Linux environment some db tuning has been done. About a year ago some of your technical support went out to partition the db into separate extents (type 2 storage). I don't know if db tuning has been done on the Windows environment.

Hardware resources?
 
HW Resources == what sort of machine? What does it have in it? Especially as regards CPU (speed, number of cores...), RAM and disk.

As for the problem reported... it is hard to say what is wrong without looking at the system but it sounds like it might be an untuned "out of the box" db install. In which case it is likely that -B is way too low. On almost all reasonably modern HW you can set it to 50,000 without worries (in reality you can probably do 100,000 pretty much blind). Especially if this is the only db on the box.

Beyond that though you'd really have to get into the particulars much more heavily. 500 records per second is rather under-whelming. Even for a horribly tuned system.

As an aside... a lot of "Progress software houses" are very heavily focused on the development side and are very capable there but, to be polite, are somewhat under-staffed with regards to database configuration and deployment issues. Their software tends to work really well in their initial deployments but as customers grow or as the company starts to acquire larger customers they start to run into situations that they don't know how to handle. Worse -- they may not even be aware that their customer's are frustrated and angry. Personally, I think that these development shops could benefit greatly from investing a bit of time with an experienced consulting firm :cool:
 
Thanks Tom,

HW resources of the 2 clients:
Linux: 12 CPU cores, Xeon box with 16 Gig RAM, not sure on HD. The -B setting is 375000.
Windows 2003: Not sure, not my client. But as a company we insist on good hardware, probably not as good as the box above though.

The Linux box has two dbs on it, Production and a Model office test environment.
The Windows box has three dbs on it, Production, Test and Dev.

To be fair I would have to agree with you on the Progress software houses comment.

Anyway, we are where we are and I need to get this process faster. do you think that dumping and loading the data will improve the scatter ratio and thus improve performance of the extraction tool?

Is there anything else we can do?


Cheers,
Drammy
 
With big B at 375000 this doesn't sound so much like an "out of the box" install so that probably isn't the immediate issue.

500 records/second is probably something else.

A high scatter factor, by itself, probably isn't the main problem but it may be aggravating the situation. Sure, dumping & loading would improve scatter. But the reloaded data will be in order of the primary index -- which may not be what your extract is needing. If a D&L does improve performance it will only be a short-term solution. You're masking the real problem -- it is a red flag that something else is wrong that ought to be fixed.

How large is the table that this data is coming from?

What percentage of the records in the table are being extracted?

Is this table in a type-2 storage area?

If it is a type-2 area what is the cluster size?

Is it mixed in with other tables or in a dedicated area?

What are the min, max and avg row size?

What is the areas rows per block?

DB block size?

Have you checked the extract program's index usage with COMPILE XREF?

Is this a workgroup db or an enterprise license?

What are the rest of the db startup parameters?

Disk resources matter. 999 out of 10 performance problems (that are not bad programming) are disk problems. Once you fix the disk problem the next bottleneck is probably also a disk problem...

Some of my technical support team? :eek: Who would that have been?
 
Sorry Tom, that should have read "our" - sorry if it alarmed you, I can understand how it might have!


Thanks for your continued assistance and I understand why it is important that your questions are answered, unfortunately I am unable to answer them as I am not in the technical support team and don't have access to these details. We have a plan though (below) and if this proves fruitless I will try and find out the answers to your questions.

We have since noticed that these two clients are actually running OpenEdge 10.0B, I thought they were on 10.1B

Are there any radical differences in sequential access performance from 10.0 and 10.1?

One client is on type 2 storage and the other isn't. We plan on getting both dbs up to Type 2 storage and then testing (on our systems) on both 10.0B and 10.1B to see what differences we can find between versions.


Thanks for your help so far,
Drammy
 
No there are no radical differences in sequential access performance between 10.0 and 10.1 -- but you haven't exactly convinced me that you are engaging in sequential access either ;)

Are these clients perhaps running a workgroup database license? Or running enterprise without having set -spin?
 
Well the actual process cycles through each table and outputs every record into a text file.

Each table is output into a separate text file. Its a bit like the Progress utility to dump a tables data, however it manipulates the data into a more useable format for loading into 3rd party db systems or other data handling routines.
 
They all run Enterprise with a -spin setting of 5000 per processor (only for multiprocessor servers).

You might want to triple check that.

Workgroup on multi-processor (or multi-core) systems suffers from extremely poor read performance prior to 10.1B. Running Enterprise with -spin 0 is, essentially, the same as having WG. Post 10.1B WG uses -spin 1. Better than 0 but still not very good. It's possible that -spin got forgotten or commented out or something to that effect...

You're dumping the whole database with this process? Is there just one session running? I'd expect that you would have fairly awful performance if you have a single disk and you are reading and writing to the same disk. I'd also expect really bad performance if the disk is RAID5.
 
I'll get on with checking it then.

Yes, whole db. I've tested with one session and 3. Normally the process runs from a client over the network so reading and writing to and from separate disks and controllers. I've noticed no difference in running on the server than running over network from a client.

I'll check the RAID 5.
 
Back
Top