SQL-92 debugging

Rob Fitzpatrick · Jan 28, 2012

A bit of background: I'm trying to devise a way to get all table record counts quickly, i.e. more quickly than tabanalys. The objective is to be able to produce before-and-after lists of all user tables and their record counts for use in a dump and load, and diff them to ensure that all records that were dumped got loaded to the new DB. There are a few KB articles about different ABL strategies for fast counts, like using queries or using FOR-EACH with FIELDS. I know that ABL is slow for this.

For comparison I played with 'select count(*) from PUB."my-table"' in sqlexp. In a medium-sized table (~ 300,000 records) my first ABL query ran in 11.7 seconds, and subsequent ones were consistently around 4.5 seconds, with either technique. (I guess the first query warmed up the buffer pool?) I didn't time the SQL query, but it ran sub-second; almost instantaneous. All queries were run on the DB server, and the 4GL client was self-service. I don't really understand why the SQL-92 engine is so much faster but for now I decided to just accept that fact, go with the SQL approach, and see where it leads me. I have very little experience with using SQL clients in Progress.

I'm experimenting with writing and running SQL queries from ABL, and then kicking the results to a flat file I can parse later into a format suitable for diffing. The problem is I'm getting errors that I can't find a reason for. Most of my queries return results and they agree with tabanalys output, but every now and again in the output I get a SQL error, like 7519 (Table/View/Synonym not found) or 10713 (Syntax error at or about (statement excerpt)). But it's not as if I'm writing the queries manually; the data is coming from _File.

So here is my code (ugly and incomplete

, but more or less functional):

Code:

/* Count records in all tables */

def var v-dbname   as char    no-undo format "x(11)".
def var v-sqlport  as integer no-undo format ">>>>9".
def var v-user     as char    no-undo format "x(16)".
def var v-password as char    no-undo format "x(16)".
def var v-outfile  as char    no-undo format "x(40)".

update v-dbname   label "Logical DB name" colon 20 with side-labels.
update v-sqlport  label "SQL port number" colon 20 with side-labels.
update v-user     label "DB User name"    colon 20 with side-labels.
update v-password label "Password" blank  colon 20 with side-labels.

v-outfile = "count_" + v-dbname + "_"
          + string( year( today ),"9999" )
          + string( month( today ),"99" )
          + string( day( today ),"99" )
          + "-"
          + replace( string( time, "HH:MM" ), ":", "" ).

def stream out.

output stream out to recordcounts.sql.
put stream out unformatted "set transaction isolation level READ UNCOMMITTED ;" skip.
put stream out unformatted "commit work;" skip.

for each _file no-lock where _file._tbl-type = "T":
  put stream out unformatted "select count(*) from PUB." quoter( _file._file-name )";" skip.
end.

put stream out unformatted "commit work;" skip.

output close.

os-command value( "sqlexp -char -db " + v-dbname
                + " -S "              + string( v-sqlport )
                + " -infile recordcounts.sql"
                + " -outfile "        + v-outfile
                + " -user "           + v-user
                + " -password "       + v-password  ).

The first time I ran it against a 353-table test DB, it ran almost perfectly. The output looks like this:

Code:

            count(*)
--------------------
                 268
            count(*)
--------------------
                  99
            count(*)
--------------------
                 648
<etc.>

I expect it to look like that; I haven't done any parsing logic yet. But at the 352nd table the output file ended with:

Code:

=== SQL Exception 1 ===
SQLState=HY000
ErrorCode=-210056
[DataDirect][OpenEdge JDBC Driver][OpenEdge] Syntax error in SQL statement at or about "se" (10713)

So does the "se" refer to the beginning of a select statement? And if so, why just that one? I've been up and down the generated .sql file and can't find an obvious problem. I removed the last two selects and ran it manually and it worked. I retyped the second last one and ran it (352 selects in the file) and it worked. I retyped the last one into the file (353 selects again) and it worked. So I had two files, one that throws an error and one that doesn't, and they're identical. Byte for byte. I don't mind errors, but phantom errors that come and go drive me nuts.

I wondered if random TCP errors could be involved. So I put client and server on different machines and scoped the traffic with tcpdump. But I don't know really what I'm looking at. I can see the select statements going to the server, and the record counts going back to the client, but I don't know the layout of all the data so the errors are hard to find. Is the wire protocol documented somewhere? Would that be part of the SQL standard, or ODBC?

If anyone can point out what I'm doing wrong (apart from making work for myself

) or suggest why I'm seeing these errors, I'd appreciate the help.

TomBascom · Jan 28, 2012

Perhaps the file is too large? What happens if you break it into chunks? Say no more than 100 tables per chunk?

The reason that SQL is faster is that it can do table scans without using an index. OE11 is supposed to have that available for the 4GL.

Rob Fitzpatrick · Jan 28, 2012

So when you say too large, do you mean just the number of queries in the .sql file, or the number of consecutive queries before a commit work? I've had the latter issue when doing update statistics as you can blow the lock table. But I didn't think that I had a scope issue as I'm only doing selects. Also, I did try doing commit work after every select; it made no difference.

The interesting thing is that I tried the code on a DB with much more data, but basically the same table count, and I got more frequent errors (10 in total). So could it be related to record count? I'll try doing it in chunks, although that seems like a workaround. The concern is that I'm hitting a constraint but I don't know what it is, so I don't know if this workaround will avoid the constraint in every case. The extreme would be to do the sqlexp inside the FOR EACH, but all that process creation and destruction is expensive on the server.

LynnReas · Jan 28, 2012

Have seen with OE 10.1b SQL errors resolved many times by adding the following to the Database configuration. Maybe it will help.
defaultConfiguration : Other Server Arguments
-SQLTempBuff 150000 -SQLTempDisk 100000000 -SQLTempPgSize 256

Restart database.

If it still fails then upgrade. :>
In 10.2b add these to the proper fields in OE Explorer.
Interestingly SQL queries that never worked in 10.1b magically work in 10.2b...

Lynn

Rob Fitzpatrick · Jan 28, 2012

Thanks Lynn, I'll look into those params. This is running on 10.2B02. I might also try it in 11.0 to see if there's any difference, but I doubt it. PSC have been pretty quiet about SQL in this release. I think most of the DB investments were in getting multi-tenancy done. Although maybe ABL table scans will pay dividends.

Parameter tweaking aside, it doesn't feel to me like I'm doing anything exotic here. It would be nice to be able to get record counts from a DB without having to bounce it first. Well, when all else fails, read the manual. I've browsed the SQL manuals but I can't say I've read them cover to cover. Time to hit the books...

TomBascom · Jan 28, 2012

I was just thinking that error message seemed like it might be saying that some where around "se" something became too long -- the file, the statement, a string or whatever... Is the file length anywhere near a "magic" number? Are there any tables with problematic or unusual characters in their names? Anything that might cause an unclosed quote? Or a need for quotes? (I see that you are using quoter -- but maybe something went wrong?)

tamhas · Jan 28, 2012

Anything different about that file name, like embedded hyphens?

Rob Fitzpatrick · Jan 28, 2012

TomBascom said:
I was just thinking that error message seemed like it might be saying that some where around "se" something became too long -- the file, the statement, a string or whatever... Is the file length anywhere near a "magic" number?

The file is 13967 bytes. The "se" in select at the line that errored out is at offset 13880 (0x3638); not magical. Every line begins with 'select count(*) from PUB."', so line length is a function of table name length. The longest table name is 21 bytes; this one is 10.

Are there any tables with problematic or unusual characters in their names?

Table name characters:[a-z],-,1 (letters, hyphens, ones); nothing illegal or problematic for SQL or sqlexp, provided the names are quoted. The only thing remotely unusual is that the table name of the prior table is a subset of this one; i.e. "xlate" worked; "xlate-lang" errored out. But there are several other instances of this pattern earlier in the list that didn't error out.

Anything that might cause an unclosed quote?

Don't think so. I have 353 tables and 706 double quotes.

Or a need for quotes? (I see that you are using quoter -- but maybe something went wrong?)

I'm using quoter, following a KB code sample, because my table names contain hyphens.

Rob Fitzpatrick · Jan 28, 2012

tamhas said:
Anything different about that file name, like embedded hyphens?

Yes, but many dozens of file names earlier in the list have them as well.

tamhas · Jan 28, 2012

Hyphen is *not* legal in a SQL identifier. You must enclose any Progress table or field name that contains a hyphen in double quotes. Can you show us the actual SQL around where the error occurs.

Rob Fitzpatrick · Jan 28, 2012

The last six lines of recordcounts.sql (with my comments to the right):

Code:

select count(*) from PUB."vendor";            <---worked
select count(*) from PUB."vendor-service";    <---worked
select count(*) from PUB."xlate";             <---worked
select count(*) from PUB."xlate-lang";        <---errored out (10713)
select count(*) from PUB."zoning";            <---not reached
commit work;

Double quotes are properly paired throughout the file. It's ABL-generated on Linux, so there are no funky Word-style “smart quotes” or em-dashes to worry about.

Rob Fitzpatrick · Jan 28, 2012

A thought occurred: would any of the query plan info in the SQL catalog tables be useful for debugging this? Or enabling client request statement caching so I could see what the last query really was, from the DB's perspective?

tamhas · Jan 28, 2012

Have you tried these statements outside the context of your generated file?

Rob Fitzpatrick · Jan 28, 2012

No, but I have manually run the .sql file in sqlexp successfully. I'll re-run that test to explore what works manually and what doesn't.

So could the issue be more related to running sqlexp with os-command, as opposed to the file's contents?

TomBascom · Jan 28, 2012

Grasping at straws...

Does the generated script always blow up?

If it does:

1) What happens if you comment out (or remove) the "xlate-lang" line? How about the line above? The line below?

2) What happens if you remove the top X lines (where X is 1, 5, 10, 50, 100...)

TomBascom · Jan 28, 2012

Rob Fitzpatrick said:
A thought occurred: would any of the query plan info in the SQL catalog tables be useful for debugging this? Or enabling client request statement caching so I could see what the last query really was, from the DB's perspective?

I think that would be very interesting to look at -- especially if you have a repeatable problem.

TomBascom · Jan 28, 2012

In your posted code you have:

Code:

output stream out to recordcounts.sql.

But you only "output close." at the end of the generation block instead of "output stream out close.".

The output stream is buffered and still open -- thus the last part may not have been flushed to disk when you execute OS-COMMAND().

tamhas · Jan 28, 2012

You mean the same generated sql file when handed manually sqlexp works, but if you process the sqlexp in OS-COMMAND, you get the error?

If true, then I would try dividing the file in two and running it in two pieces.

Rob Fitzpatrick · Jan 28, 2012

TomBascom said:
Grasping at straws...

Me too!

Does the generated script always blow up?

It seems very consistent against a given database.

If it does:

1) What happens if you comment out (or remove) the "xlate-lang" line? How about the line above? The line below?

Without the last two select lines it works. I have to do more testing to figure out exactly what works and what doesn't. But I get very different results connecting to another server with a similar schema (a few fewer tables) but lots more data. In that case I have 10 SQL errors scattered throughout the output.

2) What happens if you remove the top X lines (where X is 1, 5, 10, 50, 100...)

I think I could spend days on reverse engineering exactly what are the parameters of what works reliably and what doesn't. But that still wouldn't tell me why this is happening, nor whether this code will work reliably on the next DB I encounter. It isn't production code, but I still want it to work.

So my fundamental problem is that I don't understand enough about what is happening here, and what constraint I am hitting. I'll let you know if I figure anything out. It isn't my highest-priority item, but it would be helpful for D&Ls.

P.S. Tom: speaking of D&L, I believe you did an EMEA PUG presentation showing your new highly-parallel D&L utility. Have you released the code for that (assuming it's FOSS)?

Rob Fitzpatrick · Jan 28, 2012

tamhas said:
You mean the same generated sql file when handed manually sqlexp works, but if you process the sqlexp in OS-COMMAND, you get the error?

If true, then I would try dividing the file in two and running it in two pieces.

Yes, exactly. I'll try splitting it in two. And yet on another DB it throws its first error on the tenth table, and on nine other tables.

Specifically, tables 10, 43, 122, 141, 160, 180, 195, 208, 257, and 337. This DB is an earlier version of the same application, so the vast majority of table names are common to both. So I really don't think it's an issue there. It feels more like a run-time constraint, like I'm blowing up some temp space in memory perhaps. I'll also look into those startup parameters that Lynn suggested and see if they make a difference.

SQL-92 debugging

ProgressTalk.com Sponsor

Curmudgeon

ProgressTalk.com Sponsor

New Member

ProgressTalk.com Sponsor

Curmudgeon

ProgressTalk.com Sponsor

ProgressTalk.com Sponsor

ProgressTalk.com Sponsor

ProgressTalk.com Sponsor

ProgressTalk.com Sponsor

ProgressTalk.com Sponsor

ProgressTalk.com Sponsor

ProgressTalk.com Sponsor

Curmudgeon

Curmudgeon

Curmudgeon

ProgressTalk.com Sponsor

ProgressTalk.com Sponsor

ProgressTalk.com Sponsor

Similar threads