Creating Records in the DB

jamesmc

Member
Howdy Folks,

I have a quick question about the speed differences in creating records in temp tables and in the DB itself.

I have got a CSV file that I need a suite of progress apps to have access to. The first idea I had was to create a temp table and share it which works fine but once my application has closed I can't review the info that was loaded.

I then thought of creating a table in the DB with the exact same structure as my temp table. I pull in the data the same and I create the records the same. Using the DB allows me to review the data that was loaded and I don't need to keep loading everytime I have to restart my apps.

The big difference between these two methods that I have noticed is that it takes approx 40 secs to load into the temp table but it takes nearly twenty minutes to load into the DB!

Is there something that I am missing? There are three fields in the table and one index. The first field is an incremental record ID number that I do myself (no sequences or anything) and the other two fields are chars that are about 8 and 74 characters in length respectively. The index is primary/unique on the ID field. The database is a small one that I have created for this testing and it is local to my machine (PIII running W95 and Progress 8.3b). Oh yes, I almost forgot, there are 180,000 lines that need to be loaded.

Can someone please enlighten me on whether there are any steps that I can take to speed this process up. I know that there is going to be a lot of disk activity when writting to a DB (I have seen the lights flashing!) but is there any way of enhancing the performance?

Thanks In Advance,

James.
 

MurrayH

Member
Ways to speed up the DB

There are numerous ways to speed up a database:
* more -B
* more APWS (if needed)
* creating fixed extents so the OS doesn't need to request more disk space
* load in single user mode with "-i" (not for the faint of heart)

Its a little vague what you are doing here. Do you want to load it once for all time or load it every so often or each time the CSV file changes. Do you want to throw the old stuff away in that case??

Anyway .. assuming you want it loaded in multi-user mode, throw more -B at it and maybe a few APWs.

Murray
 

mra

Junior???? Member
Hello James!

Have you started a db server for your local machine? Or are you connecting directly to the db file? There may be a performance gain here too.

Also emptying a temp table is much faster than deleting every row in a database table, maybe you can get something here?

Regards
Mike
 

jamesmc

Member
Sorry if I was a little vague to begin with. I am connecting directly to the DB that is on my own machine and yes I will be wanting to clear down the file everytime I get a new CSV file (like today the new file is just over 202,000 lines long).

I used to use the temp table option but it meant that I had to load the data everytime I ran the code and if the code crashed for any reason I could never query the data as it had disappeared!

I will start a server for the DB and connect that way and see if the speed increases but if there is anymore advice then please send it on.

Thanks,

James.
 

jamesmc

Member
Sorry if I was a little vague to begin with. I am connecting directly to the DB that is on my own machine and yes I will be wanting to clear down the file everytime I get a new CSV file (like today the new file is just over 202,000 lines long).

I used to use the temp table option but it meant that I had to load the data everytime I ran the code and if the code crashed for any reason I could never query the data as it had disappeared!

I will start a server for the DB and connect that way and see if the speed increases but if there is anymore advice then please send it on.

Thanks,

James.
 

MurrayH

Member
Ok then what platform are we on??

Anyway, the first thing to do is convert the CSV file to use quote space format. EG:

"100" "My Name" "My info"

instead of

100,"My Name", "My info"

Then use the import statement.

If you are in multi-user mode:
* Make a REALLY big -B and if you are loading the CSV
* Beef up the bi blocksize to 8k and the bi cluster size to 8k (if not there already)
* Add some APWs
* Make some fixed extents so the OS has disk space allocated

If you are in single user mode, you can use:
* As above but drop the APWs and put in "-i" if you want really fast loading and are sure the power is not going to go out

However, loading 202,000 records is not going to be fast anyway.

Does that help??

Murray

PS If its on the same box ensure you are using shared memory and not socket connections.

EG: prowin32 -db c:\mydb
or
pro /data/mydb
 

Chris Kelleher

Administrator
Staff member
James, I would look at your transaction scooping as well... if you are adding each of the 200+k records each as a seperate transaction there can be considerable overhead which could be causing the slow loading times. When importing records, I'll usually try to add 100 records at a time each in their own transaction, this seems to be the fastest way to load into the database.
 

Chris Kelleher

Administrator
Staff member
Here's an example of what I am talking about:
Code:
outerloop:
repeat transaction:

  repeat i = 1 to 100
    on end-key undo, leave outerloop:
   
     create record.
     import record.
   
   end.
    
end.
 

jamesmc

Member
Thanks Chris. I had a go at creating this myself but I did a do while v_counter < 100 instead of a repeat loop and I ended up not getting the last 57 records as the transaction wasn't commited (as the counter never went > than 100)!.

this took me a little while to figure out but I have taken your example on board so hopefully I shouldn't have the same problem again.

Thanks again.

James.
 
Top