data extract via ODBC..perfomance problems

gah

New Member
Hi
I have an issue when extracting data via ODBC.
Progress 10.1C on Win2003 server////4 quad core processors 4Gb RAM
DataDirect OpenEdge 10.1C Driver 5.30.00.74 configured as READ_UNCOMMITTED and FETCH_ARRAY_SIZE = 50 NO TRACE

The issue is that when extracting data via ODBC the perfomance degrades over the period of the extract.
an example of this is as follows:

SELECT * FROM TABLE.....2.8M rows in table

Elapsed time ..........30sec----60secs----90secs----120secs----150secs
Row count returned 670000 1327000 1445000 1519000 1560000
Rows/sec ........22333 21900 3933 2466 1366

To investigate further I am running SQLDBX on this server to return results and eliminate any network bottlenecks.

What I find is that if I run the following

SELECT PK_COLUMN FROM PUB.TABLE

I can return all 2.8m rows in under 7 secs

If I run a single column select on any indexed column I get similar sub 7 sec performance

However if I run a single column select on a non indexed column the extract performance degrades rapidly after about 1.3M rows.

....and if i run a two column select (both indexed) then it still slows down.

The data is being extracted from a 3rd pty Progress Solution and we are not in control of the start up parameters nor do we have a developer licence.

Has anyone any suggestions on why we experoence this slow down.

many thanks
 

TomBascom

Curmudgeon
Have you run "update statistics" so that the optimizer can work properly?

(This requires no extra licenses nor any startup parameters, it's just SQL.)
 

gah

New Member
Many thanks for your response.

I have gone back to the vendor with your suggestion and await their response.

We have no Progress DBA and have entirely relied upon the vendor for the necessary dB maintenance/upgrades etc.

I have looked at "update statistics" in PSDN and this might well prove fruitful..however please excuse my ignorance...but do I run this from the procedure editor ?

Also any guide on how long this may take for a 3M row table, 25 cols and 6 indexes?

Kind regards
 

TomBascom

Curmudgeon
You would not run it, or any SQL, from the procedure editor. You run it from SQL Explorer or any other SQL tool.

Hopefully you aren't running the previously discussed queries from the Progress editor. That would be using SQL-89 which is just a horrid kludge on top of the 4GL engine that has nothing to do with the SQL-92 engine that you access through ODBC.

I hate to ask but... why are you extracting 3 million rows? That just seems awfully suspicious.

As for "how long?". I would hope that, after updating statistics, you would get comparable performance to your indexed queries.
 

gah

New Member
Hi Tom

I really appreciate your interest in this.

To give you some background on the situation I find myself in:

We use a Progress app designed by a 3rd pty vendor to provide core business functions...sales/purchases/stock management etc...I am sure you get the picture.

The vendor is a well respected Progress partner.

However as a business we have specific (and constantly changing) reporting requirements .

To meet these internal demands we replicate the Progress tables over to SQL and report from there....this gives us the opportunity to add report specific indexes in SQL that the vendor would not wish to add to the app and would of course cost us to have implemented.

Additionally because of our continually changing and growing demands for reports...it is not financially efficient to continue to keep paying the vendor to amend exg reports / alter output formats / etc.

We replicate Progress tables to SQL 2005 by way of MS SQL SSIS..where we use SQL ExecuteSQL tasks to update/insert changed rows.
We use Openquery syntax against a linked Progress server...

for example: in some Progress tables a new record has a sequential PK identity number...therefore we extract the MAX(ID_No) from SQL as a variable and pass that back in the Openquery as SELECT ...FROM...WHERE IDNO > variable....the extract is passed to a staging table in SQL and from there we perform UpSerts.

We run these every 5 mins from SQL against the Progress Linked Server for all core tables (20+ tables - excess of 100M total rows)....
this works very well and under normal usage against our Progress production box ...we are sub 30 secs.

The reason to extract all 3M rows...(actually will be 100M plus)...is a) to intially populate SQl and b) to ensure that we can repopulate SQL if our SSIS package fails c) to repop periodically to ensure data integrity.

.......hope you are still with me!

Your reply stated "You would not run it, or any SQL, from the procedure editor. You run it from SQL Explorer or any other SQL tool."

We have recently upgraded to 10.1C with no dev licence.
In previous version 9.1x we had access to SQL Explorer via the progress toolset...but our vendor tells us that SQL Explorer is not available in 10.1C...I cant comment because I dont know?

When you mention "any other SQL tool"...I am unsure on what you mean...could you please provide a little more detail.

We rebuild/reindex stats on SQL on a regular basis....but from the way I read your reply this is something that has to be done on the Progress db.

No...I havent run "update statistics" from Progress editor....

My question re "how long" was in relation that if we knew how to implemnet this correctly...how long long would it take on the Progress db...mins/hours?

Once again, I thank you for your replies and please understand that we have no Progress experience and that I am sure is the way the vendor would prefer to keep it !!!
 

TomBascom

Curmudgeon
Thanks for the clarifications. It makes more sense now and I can stop worrying that you might be using SQL-89, phew.

The vendor is sort of correct -- the GUI SQL Explorer tool is no longer provided with 10.1B and higher. I'm sure that that is largely because there are so many good free alternatives out there. One that is used by many Progress people is called "SQuirreL". The command line version of SQL Explorer is still there and you could also use that. There are good Progress Kbase entries on this and on running update statistics. There are also quite a few threads on it in various forums here...

It's hard to say how long the 100M row extract will take. You'll just have to try it and find out. If it takes too long there are almost certainly ways to speed it up but you'll have to be willing to get past the idea that nobody but your application vendor can tune the database. That's bunk and any well respected Progress partner knows better somewhere inside their organization. Their sales team might not want to admit it though ;)
 

gah

New Member
Update
I believe I have run update statistics...
the only way I could get this to perform was by setting the DSN Advanced from READ_UNCOMMITTED to "blank"
...although the query ran I am not sure how to check success.

The statement was
UPDATE TABLE STATISTICS AND INDEX STATISTICS
AND COLUMN STATISTICS FOR MyTable ;


If I SELECT * FROM SYSPROGRESS.Systblstat

TBLID PROPERTY ATTRIBUTE VALUE VAL_TS
59 2 NULL NULL 21/09/2009 19:38:56
59 4 NULL 2816635 NULL

There are no other entries in this table.
Is this what is expected?

This has made no difference to the query extract time at all.

In addition i was advised by the vendor to alter the startup parameters to include

-SQLTempPgSize 16
-SQLTempBuff 2000

or variations of this where SQLTempBuff is calculated as follows:

SQLTempBuff = (-SQLTempPgSize / 8) * 1000
..this had no effect and increasing to
-SQLTempPgSize 64
-SQLTempBuff 8000

this slowed the extract by a factor of 3.

any further ideas please?
regards gah
 

Casper

ProgressTalk.com Moderator
Staff member
I have never tested this, but I find the results you have higly perculiar.
Regarding update statistics:

Code:
UPDATE TABLE STATISTICS AND INDEX STATISTICS
AND COLUMN STATISTICS FOR MyTable ;
[COLOR=red]commit;[/COLOR]

You miss the commit and I would run it for all tables and split it apart in table index and column statistics. (Otherwise you risk blowing up the Locktable).

Output I get when running update statistisc from sqlexp is:

OpenEdge Release 10.1B03 as of Sat Nov 3 00:50:00 EDT 2007
Connecting user "sysprogress" to URL "jdbc:datadirect:eek:penedge://localhost:-1;da......"
Update count = 0.
Update count = 0.
Update count = 0.

1 update count=0 for each statement.

Output form systblstat should look like:
Code:
SQLExplorer>select * from systblstat;
      TBLID    PROPERTY   ATTRIBUTE       VALUE VAL_TS
----------- ----------- ----------- ----------- -----------------------
         -5           2                         2009-09-20 03:07:59.0
         -5           4                      34
          1           2                         2009-09-20 03:07:59.0
          1           4                    1676
          2           2                         2009-09-20 03:07:59.0
          2           4                    7679
          3           2                         2009-09-20 03:07:59.0
          3           4                     764
          4           2                         2009-09-20 03:07:59.0
          4           4                    5036
          5           2                         2009-09-20 03:07:59.0
          5           4                     264
          6           2                         2009-09-20 03:07:59.0
          6           4                    3802
          7           2                         2009-09-20 03:07:59.0
          7           4                       8
          8           2                         2009-09-20 03:07:59.0
          8           4                       1
          9           2                         2009-09-20 03:07:59.0
          9           4                     250
         10           2                         2009-09-20 03:07:59.0

I think the null values pose a problem. You need however a user with write permissions (e.g. sysprogress) to be able to run update statistics.

SQLTempBuff = (-SQLTempPgSize / 8) * 1000
..this had no effect and increasing to
-SQLTempPgSize 64
-SQLTempBuff 8000

Since 10.1X these parameters are not neccesary anymore since the mechanism for using temp tables have changed: see KB P122494:

http://progress.atgnow.com/esprogress/Group.jsp?bgroup=progress&id=P122494

Casper.
 

gah

New Member
Hi Casper

I will have another go with update stats WITH commit...if I can get this to run from my ODBC connection.

I note your comments re the parameters...I had already read this and passed this back to the vendor.

I am still in discussion with the vendor on how I can run SQLExplorer from CLI in 10.1C...am told the GUI is not available,,,and still awaiting response.

thanks for your input...much appreciated.
regards gah
 

Casper

ProgressTalk.com Moderator
Staff member
I am still in discussion with the vendor on how I can run SQLExplorer from CLI in 10.1C

You can easily run it from proenv: just type sqlexp -help for explanation of the command line utility.
Just make sure you have a full install of the progress runtime client.

Casper
 

gah

New Member
Have run "update statistics" from SQLExplorer CLI for INDEX/TABLE/COLUMNS on this particular table.

Unfortunately this has not delivered any improvement at all to the extract time.

A little more experimentation provide the following:

SELECT COL_1 FROM PUB.MYTABLE
WHERE COL_1 > 0 AND COL_1 < 1000000

SELECT COL_1 FROM PUB.MYTABLE
WHERE COL_1 > 1000000 AND COL_1 < 2000000

SELECT COL_1 FROM PUB.MYTABLE
WHERE COL_1 > 2000000 AND COL_1 < 3000000

where COL_1 is the PK.

All the above run in sub 4 seconds


If I then add a second column to the extract (doesnt seem to matter if part of index or not) I get the following results:

SELECT COL_1, COL_2 FROM PUB.MYTABLE
WHERE COL_1 > 0 AND COL_1 < 1000000
time to run 15secs / 6 secs

SELECT COL_1, COL_2 FROM PUB.MYTABLE
WHERE COL_1 > 1000000 AND COL_1 < 2000000
time to run 1min 50secs / 10secs

SELECT COL_1,COL_2 FROM PUB.MYTABLE
WHERE COL_1 > 2000000 AND COL_1 < 3000000
time to run 8min 40 secs / 1min 11 secs

The times are for first and second runs. As you can see there are significant differences as we attempt ro return data ....hmmm?

The PK is an identity int column.

any ideas please?
 

gah

New Member
We have also resurrected our old 9.1E server and running same queries against same table...we do not experience any slow down in the extract at all.

the only difference is Progress version and ODBC driver.

regards gah
 

LarryD

Active Member
Just curious... do you have another key where the primary parts of the key are COL2 + COL1? or perhaps a key where COL2 is the first part of the key?

You might want to give INDEX hint a try just for grins:

SELECT column_list
FROM table_name WITH (INDEX ( index_val ))
WHERE ...
 

gah

New Member
Larry ,,thanks for your thoughts.

have tried this a while back and as I recall; it made no difference

ultimately I require all fields from this table...so this will not help the cause I am afraid.

This problem is not specific to a single table either ....I have tested similar extracts on other larger tables and get the same slow down.

thanks anyway
 

Casper

ProgressTalk.com Moderator
Staff member
I know the database is not under your control but I am very interested to know more of the condition of the database you're working on.
What is the condition of the database, does it have type II storage area's?
How about fragmentation and scattering. Does the table has its own storage area?

It would be nice if you could post dbanalys output from that table. (Index and table information).

Casper.
 

gah

New Member
Not sure about the storage areas...how would I find out?

We can run dbnanalys...can we do that on a single table or does is it have to be the full db (430 tables)?

gah
 

TomBascom

Curmudgeon
I was about to say what Casper just said ;) I'm guessing that all of your data is stuffed into the schema area :(

To see what you have for storage areas just run:
Code:
prostrct list dbname

You can get dbanalys on a per areas basis:

Code:
proutil dbname -C dbanalys AreaName

(The area name is an optional argument.)
 

gah

New Member
These are the results

OpenEdge Release 10.1C03 as of Sun Feb 1 00:37:52 EST 2009

Area Name: Control Area, Type 6, Block Size 4096, Extents 1, Records/Block 32, Cluster Size 1
Ext # 1, Type VARIABLE, Size 32 KByte, Name: D:\ tst_data.db

Area Name: Primary Recovery Area, Type 3, Block Size 8192, Extents 3

Ext # 1, Type FIXED , Size 1024000 KByte, Name: D:\ tst_data.b1
Ext # 2, Type FIXED , Size 1024000 KByte, Name: D:\ tst_data.b2
Ext # 3, Type VARIABLE, Size 8 KByte, Name: D:\ tst_data.b3

Area Name: Schema Area, Type 6, Block Size 4096, Extents 15, Records/Block 32, Cluster Size 1
Ext # 1, Type FIXED , Size 2048000 KByte, Name: D:\ tst_data.d1
Ext # 2, Type FIXED , Size 2048000 KByte, Name: D:\ tst_data.d2
Ext # 3, Type FIXED , Size 2048000 KByte, Name: D:\ tst_data.d3
Ext # 4, Type FIXED , Size 2048000 KByte, Name: D:\ tst_data.d4
Ext # 5, Type FIXED , Size 2048000 KByte, Name: D:\ tst_data.d5
Ext # 6, Type FIXED , Size 2048000 KByte, Name: D:\ tst_data.d6
Ext # 7, Type FIXED , Size 2048000 KByte, Name: D:\ tst_data.d7
Ext # 8, Type FIXED , Size 2048000 KByte, Name: D:\ tst_data.d8
Ext # 9, Type FIXED , Size 2048000 KByte, Name: D:\ tst_data.d9
Ext # 10, Type FIXED , Size 2048000 KByte, Name: D:\ tst_data.d10
Ext # 11, Type FIXED , Size 2048000 KByte, Name: D:\ tst_data.d11
Ext # 12, Type FIXED , Size 2048000 KByte, Name: D:\ tst_data.d12
Ext # 13, Type FIXED , Size 2048000 KByte, Name: D:\ tst_data.d13
Ext # 14, Type FIXED , Size 2048000 KByte, Name: D:\ tst_data.d14
Ext # 15, Type VARIABLE, Size 64 KByte, Name: D:\ tst_data.d15


Any idea on how long dbanalys will take to run....30Gb db with 430 tables?
Is it processor/memory hungry?

thanks
 

LarryD

Active Member
Eeeek!

I'm sure that Tom, Casper, Thomas, and others will chime in here with some expert advice, but here are a few thoughts...

I'm going to hazard a guess that the db has not been dumped/reloaded in recent memory? If not, it would be likely that data and indexes have some significant scatter factors.

This is definitely ripe for Type II storage areas and some expert db tuning. FWIW, my experience is that Type II will improve the speed of SQL/reports/et al singnificantly.

I can't speak as to processor/memory usage for dbanalys, but I would not be surprised if it took a good 20-30+ minutes to run. FYI, you can redirect the output to a text file, which then is able to be loaded into an excel spreadsheet for easier manipulations and sorting.
 
Top