Num records in table

joey.jeremiah

ProgressTalk Moderator
Staff member
thanks!


TomBascom said:
I'd like to see the cost based optimizer used by the 4gl too. But I'm not holding my breath. It is, by all accounts, a big leap -- not for the database engine but for the 4gl. I haven't tried this but I've long thought it would be interesting to take the cost statistics that are gathered for SQL and apply them to selectively over-ride 4gl index selection. If I ever run into one of those horrible queries that you seem to so commonly find I plan to give it a whirl.
i also though about reaching into the sql stats but afaik they're not avail to the 4gl thru vsts or any other way.

it would probably have to be some sort of java bolt on ? i don't know about that
 

TomBascom

Curmudgeon
"Bolt on" might be a bit extreme :awink: You just need to get comfortable using SQL and the natural route to that is via Java. If you check out the "Update Statistics" entry in the "OpenEdge Data Management: SQL Reference" manual I think that you'll see that all the information that you need is there.
 

Casper

ProgressTalk.com Moderator
Staff member
Interesting!

I don't know if I understand this correctly, I never thought about this concept but do you mean use the SQL statistics to dynamically form the "right" query by applying the gathered statistics. Which could involve dynamic complete reorganisation of the query or dynamically force to use different index? And this all in order to make a better "index-guess" then the current rules-base optimiser?
Or even join order depending on the table cardinality (which could change from site to site?)

(Or am I completely wrong here....)

Could you please fill me in on the idea?

Regards,

Casper.
 

TomBascom

Curmudgeon
1) Use SQL to UPDATE STATISTICS.

2) Propose a query to SQL -- it would be useful to have a 4gl WHERE CLAUSE to SQL translator.

3) Examine SQL's execution plan for the proposed query.

4) Apply that plan back to the 4gl query. It would be handy to have a SQL to 4GL translator :rolleyes:. For the most part this probably consists of re-ordering sub-queries and applying USE-INDEX to override the 4gl compiler's choices.

At first I would do this manually and on a case by case basis. If the results were promising then it would be worth trying to automate. Making it dynamic would have to be down the road somewhere.
 

joey.jeremiah

ProgressTalk Moderator
Staff member
going back to no-index and remote clients

the root diff between 4gl queries and sql, is that sql streams data while 4gl navigates trhu data, cursors.

no-index scans would also mean no-index cursors, probably. of course it's possible, recids are sequential in the table's cluster. but it also means the engine will have to accommodate for that quirk.

i'd guess the server-side single-table queries for remote clients allow the client to work in pretty much the same way as a self-serving client with this added extension. but, again, the results are a huge handicap and would have to be addressed at some point.

a self-serving client is pretty unique even in terms of efficiency (besides the added control). forget about native drivers you can't get more direct then that.
 

joey.jeremiah

ProgressTalk Moderator
Staff member
TomBascom said:
1) Use SQL to UPDATE STATISTICS.

2) Propose a query to SQL -- it would be useful to have a 4gl WHERE CLAUSE to SQL translator.

3) Examine SQL's execution plan for the proposed query.

4) Apply that plan back to the 4gl query. It would be handy to have a SQL to 4GL translator :rolleyes:. For the most part this probably consists of re-ordering sub-queries and applying USE-INDEX to override the 4gl compiler's choices.

At first I would do this manually and on a case by case basis. If the results were promising then it would be worth trying to automate. Making it dynamic would have to be down the road somewhere.
a few hickups -

1. you can only use a single index with use-index, like v6 queries.

2. it would require the sql engine to always be present. which is not something to be counted on, in my opinion.

in that case why not use something like run stored-procedure to send a sql statement and a dataserver, doesn't sound right to me.
 

Casper

ProgressTalk.com Moderator
Staff member
no-index scans would also mean no-index cursors, probably. of course it's possible, recids are sequential in the table's cluster. but it also means the engine will have to accommodate for that quirk.

Sequential reads with no index would be faster if using type II storage area. Since the blocks in type II area's are non-social blocks, to read all the entries in a table you just have to follow the pointers connecting the clusters and read the data in the clusters. Kinda like flushing data.

If I'm not mistaken then the database engine is ready for that. That's one of the reason why empty temp-table is instantanious, since temp-tables already work that way.
Deleting a table will be no more then just deleting the pointers.

That's one of the big frustrations that although the database internals are ready for that, the ABL is not.....

Casper.
 

TomBascom

Curmudgeon
A few hiccups sure. But I still think its worth adding to the toolbox.

I didn't claim that it would solve world hunger ;p
 

Casper

ProgressTalk.com Moderator
Staff member
Pushed the submit button to quick :)

i'd guess the server-side single-table queries for remote clients allow the client to work in pretty much the same way as a self-serving client with this added extension. but, again, the results are a huge handicap and would have to be addressed at some point.

you're absolutely right.
But then again depending on the performance gain with handling big tables it could be worth while to sort the results somehow in a later stage.

Casper.
 

joey.jeremiah

ProgressTalk Moderator
Staff member
... but theres another option, simple and it works.

it also has a use for a cost-based optimizer, but more as a programmatic override.

maybe a query "tuner" is a more suitable description ?


because the buffers join order is dynamic the where clause is entered for the entire query.

one of the interesting challenges were parsing the where clause, simplifying the conditions and finally spreading them across the query buffers.

sorry guys, it's part of a bigger project i've been working on, so i think i'll hang on to this one.

we all have our crazy projects we're working, right :)
 
Joey,

I'm assuming you've noticed it already, but if not you have a chance to offload a little at the Peg campfire with Gus.

Bring your virtual guns and bullets.
 

Casper

ProgressTalk.com Moderator
Staff member
And maybe even better,

you should all go to PTW in Ahtens (not so far form Israel :). Then we can talk AND drink beer!!

Casper.
 

Casper

ProgressTalk.com Moderator
Staff member
Shall I write him an e-mail how important it is to meet your pears, gain knowledge, broaden your view, and advertise your company by being there? :D
 

BCM

Member
Hello. I found this thread when I did a search on "record" and "count". Our firm runs a Progress application. We run reports off copies of the Progress data. Our reporting tools are M$. So, we copy the Progress data to Sql Server. As an auditing issue, we need to show that the count of records copied to Sql Server is equal to the count of records in the Progress table. We copy almost 70 tables daily. Some of the tables contain upwards of 4 million records.

As I write this I am running the 4GL code, below, which was recommended in one of the early posts in this thread.

define query q1 for <tablename> scrolling.
etime(yes).
open query q1 preselect each <tablename>.
display num-results("<tablename>") etime.

The table being queried has over 3.7 million records. The query has been running for 15 minutes and is not yet finished. On the Sql Server side, I can execute the statement "select count(*) from <tablename>" and receive the results in about 6 seconds.

Whether using 4GL or SQL coding, what is the fastest way to obtain the record count of a Progress table?
 

joey.jeremiah

ProgressTalk Moderator
Staff member
Hello BCM !!!

How are you ? I mean besides Progress :)

To count records fast, really fast, in progress or any other database the query would have to be satisfied using only the index, without accessing the table data. Indexes can be hundreds and even thousands of times smaller then the table data, besides record scattering, fragmentation issues etc.

AFAIK, the progress SQL engine for the version you're using does not support this feature (you could just give it a try if you haven't done that already). And 4GL queries always go through the table data.

You should also consider since, from what I remember from our previous discussions/fights ;), there hasn't been a dump and load done on your progress database in the past many, many years and you're rebuilding the MSSQL database, I'm guessing, every night so there are no scattering and fragmentation issues like the you have with the progress database.

There is a way to count only the index entries using an undocumented utility, I don't remember exactly but I think it also counts stale key entry holders (stale place holders left by transactions) or maybe it was because theres no way to tell apart root/intermediate and leaf blocks. So it wouldn't be dead accurate but it would give you a close figure very, very fast.

Not very long ago I wrote a post about possibly writing a log-based replication engine (probably, open source) from progress to mySQL (but it can also be done for MSSQL) for reporting purposes and licensing issues. That would give you a much more uptodate data (with in a few minutes) and it can also be controlled and monitored. I think, that would be a much better overall solution, though it's a pretty big step to take.

Even if you're aren't personally interested, what do you think ?
 

BCM

Member
Hello, Joey, it's nice to hear from you. I'm fine, thank-you. We dumped and reloaded our data about three months back. During the reload we moved several tables to individual storage areas. The Progress database has been working much faster since then. However, obtaining the record count of a large table is still a very slow process.

I think your replication idea is good. In this firm, we would not choose to use something that ran within the Progress database for fear that we might have a negative impact on the licensed application. We rely 100% percent on the vendor for support because we have no code or technical documentation. Would your replication method run external to the Progress database?
 

TomBascom

Curmudgeon
The fastest method to get a simple record count is usually to run:

Code:
proutil dbname -C tabanalys

And then quickly parse the output (on UNIX systems grep ^PUB. returns the relevant lines).

Note that proutil is "area aware" so you could run multiple copies in parallel and restrict them to a particular area. Do this by adding the area name to the end of the proutil command.
 

BCM

Member
Tom, I'd like to try the approach you recommend. All this must be done from a script or code. Can I use ProUtil from a remote box?
 
Top