Lock table overflow during Update Statistics

Rob Fitzpatrick

ProgressTalk.com Sponsor
I'm preparing to run Update Statistics in a client's prod environment (AIX 7.1, OE 11.6.0). First I ran it in a test DB on a different system (SLES 11.1, OE 11.6.3) and didn't have any errors. The test DB was created via D&L, not prorest. This is the code I run to create a SQL script:

Code:
/* Generate a SQL UPDATE STATISTICS script to be run from sqlexplorer.
   From KB article P115266 (S000011034)
   https://knowledgebase.progress.com/articles/Article/P115266/p

   To run the output of this program:
   sqlexp -db <dbname> -S <SQL broker port>
      -infile UpdateStatistics_<dbname>.sql
      -outfile UpdateStatistics.out
      -user <sql username> -password <password>

   where <username> is a DB user with SQL access
*/

define variable v-tbl-has-recs  as logical  no-undo.
define variable bh-table-buffer as handle   no-undo.

output to value( "updatestatistics_" + ldbname( "dictdb" ) + ".sql" ).

for each dictdb._file no-lock where _file._file-number > 0 and _file._file-number < 32768:

  create buffer bh-table-buffer for table _file._file-name.

  v-tbl-has-recs = bh-table-buffer:find-first( "", no-lock ) no-error.

  if v-tbl-has-recs then        /* only update the statistics if the table contains data */
  do:
    put unformatted "update table statistics and index statistics and all column statistics for PUB." quoter( _file._file-name )";" skip.
    put unformatted "commit work;" skip.
  end.

  delete object bh-table-buffer.

end.

output close.

On the internal SLES test box, where the SQL script succeeded, this is an excerpt from the log:
Code:
SRV     3: (8873)  Login usernum 66, remote SQL client.
SRV     3: (7129)  Usr 66 set name to sysprogress.
SQLSRV2 3: (-----) Login by user 66. Using TCP/IP IPV4 address 127.0.0.1
SRV     3: (17741) Enabling table locks.
SQLSRV2 3: (10996) %Database statistics (table,index,column) updated for table PUB.app-asset-liab.
SQLSRV2 3: (10996) %Database statistics (table,index,column) updated for table PUB.app-cif.
...  (other table messages omitted for brevity)
SQLSRV2 3: (10996) %Database statistics (table,index,column) updated for table PUB.xml-schema.
SQLSRV2 3: (10996) %Database statistics (table,index,column) updated for table PUB.xml-setup.
SQLSRV2 3: (10996) %Database statistics (table,index,column) updated for table PUB.xml-transaction.
SQLSRV2 3: (-----) Logout by user 66.
SRV     3: (453)   Logout by sysprogress on  .

Note the (17741) message near the beginning. It was run via sqlexp with just the required settings. I didn't change the session's transaction isolation level.

Next we ran it on the client's test server (AIX 7.1, OE 11.6.0), on a copy (via prorest) of the prod DB. The log looked like this:

Code:
SQLSRV2 14: (-----) Login by user 79. Using TCP/IP IPV4 address 16.86.186.224
SQLSRV2 14: (10996) Database statistics (table,index,column) updated for table PUB.app-asset-liab.
SRV    14: (915)   Lock table overflow, increase -L on server
SQLSRV2 14: (10996) Database statistics (table,index,column) updated for table PUB.app-cif-addr.
...
SQLSRV2 14: (10996) Database statistics (table,index,column) updated for table PUB.app-security-bldg.
SRV    14: (915)   Lock table overflow, increase -L on server
SQLSRV2 14: (10996) Database statistics (table,index,column) updated for table PUB.company-interface.
...
SQLSRV2 14: (10996) Database statistics (table,index,column) updated for table PUB.ISO-Setup.
SRV    14: (915)   Lock table overflow, increase -L on server
SQLSRV2 14: (10996) Database statistics (table,index,column) updated for table PUB.loan-app-detail.
...
SQLSRV2 14: (10996) Database statistics (table,index,column) updated for table PUB.sys-ctrl.
SRV    14: (915)   Lock table overflow, increase -L on server
SQLSRV2 14: (10996) Database statistics (table,index,column) updated for table PUB.trans-log.
SQLSRV2 14: (10996) Database statistics (table,index,column) updated for table PUB.xml-map.

Note the absence of a (17741) message. I don't know the significance of this; I don't see anything in the SP01-SP03 release notes related to Update Statistics and locking. This script was run the same way, with the same settings, as the first run which succeeded.

This was done after hours and there shouldn't have been anyone else logged in and taking lots of locks (I haven't yet proven this conclusively). But I doubt the LTO was caused by other (ABL) client activity. This is a secondary application DB where there are few locks taken generally, and it's a test environment. Also, the Update Statistics was later run against the primary application database and it also threw LTO errors.

What's also odd is that there seems to be no correlation between table sizes and the overflows. These are the tables whose updates were current during the lock table overflows, and their record counts and sizes in bytes (from dbanalys):

Code:
table              records    size (b)  
app-asset-liab       4,026     400,486
app-security-bldg  190,365  13,107,200
ISO-Setup                8         361
sys-ctrl                16         639

Finally, from the docs on Update Statistics:
"Obtaining statistics does not require an exclusive lock on the
schema or any table locks. Rows written to statistics tables will be exclusively locked, as in
every transaction performing updates. Therefore, statistics can be obtained while normal
database operations continue."

Questions:
  • What is the significance, if any, of the 17441 message (11.6.3 on Linux)?
  • Why would the Update Statistics cause LTO erorrs when it did, especially when there are other tables in the same DB with millions of records and GB of data, where it didn't overflow the lock table?
  • Apart from the records being written in the handful of SQL_STAT tables, what should be locked? Seems like nothing, according to the docs.
  • Has anyone else experienced such issues during a mass Update Statistics?
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
Another oddity, which I can repro on Linux 11.6.3: half an hour after the last Update Statistics statement is run, the broker writes another message: "Disabling table locks."

Code:
[2018/01/16@17:06:41.073-0500] P-75973      T-140438450697984 I SRV     1: (17741) Enabling table locks.
[2018/01/16@17:06:41.126-0500] P-75973      T-140438450697984 I SQLSRV2 1: (10996) %Database statistics (table,index,column) updated for table PUB.Invoice.
[2018/01/16@17:06:41.180-0500] P-75973      T-140438450697984 I SQLSRV2 1: (10996) %Database statistics (table,index,column) updated for table PUB.Customer.
[2018/01/16@17:06:41.917-0500] P-75973      T-140438450697984 I SQLSRV2 1: (10996) %Database statistics (table,index,column) updated for table PUB.Item.
[2018/01/16@17:06:41.962-0500] P-75973      T-140438450697984 I SQLSRV2 1: (10996) %Database statistics (table,index,column) updated for table PUB.Order.
[2018/01/16@17:06:42.069-0500] P-75973      T-140438450697984 I SQLSRV2 1: (10996) %Database statistics (table,index,column) updated for table PUB.Order-Line.
[2018/01/16@17:06:42.082-0500] P-75973      T-140438450697984 I SQLSRV2 1: (10996) %Database statistics (table,index,column) updated for table PUB.Salesrep.
[2018/01/16@17:06:42.097-0500] P-75973      T-140438450697984 I SQLSRV2 1: (10996) %Database statistics (table,index,column) updated for table PUB.State.
[2018/01/16@17:06:42.160-0500] P-75973      T-140438450697984 I SQLSRV2 1: (10996) %Database statistics (table,index,column) updated for table PUB.Local-Default.
[2018/01/16@17:06:42.181-0500] P-75973      T-140438450697984 I SQLSRV2 1: (10996) %Database statistics (table,index,column) updated for table PUB.Ref-Call.
[2018/01/16@17:06:42.185-0500] P-75973      T-140438450697984 I SQLSRV2 1: (-----) Logout by user 24.
[2018/01/16@17:06:42.185-0500] P-75973      T-140438450697984 I SRV     1: (453)   Logout by robf on  .
[2018/01/16@17:07:42.746-0500] P-76149      T-140143876982528 I DBANALYS5: (452)   Login by robf on /dev/pts/2.
[2018/01/16@17:07:42.750-0500] P-76149      T-140143876982528 I DBANALYS5: (7129)  Usr 5 set name to Dbanalys.
[2018/01/16@17:07:42.750-0500] P-76149      T-140143876982528 I DBANALYS5: (7129)  Usr 5 set name to robf.
[2018/01/16@17:07:42.843-0500] P-76149      T-140143876982528 I DBANALYS5: (453)   Logout by robf on /dev/pts/2.
[2018/01/16@17:37:43.061-0500] P-75815      T-139700488828672 I BROKER  0: (14637) Disabling table locks.
 

TomBascom

Curmudgeon
I don't have an explanation or a solution but in the way of "moral support" I vaguely recall occasionally needing to bump -L up in order to get a successful "update statistics" run.

So at least you probably aren't alone.

In the spirit of "trust, but verify" you might want to be doubly sure that the generated script really does have "commit work;" correctly inserted after each SQL statement. Your code *looks* correct but you didn't publish the generated script so I cannot say for sure.

Sheer speculation -- are you doing anything with the new autonomous schema update parameters? (The stuff that auto-truncates/updates SQL width problems...)
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
Thanks Tom. There is indeed a "commit work;" after every "update statistics".

I played with ADT/ASU when 11.6 came out, and was planning to deploy it with this client. However I couldn't get it working as expected in my testing and haven't found time to return to it, either to do more testing or open a case. For now, as a stopgap I have a weekly dbtool cron job.

I thought about bumping up -L, but of course the question is, bump it to what? What am I locking? (That's a rhetorical question, though I *do* want to know.)

Here's another test: generate the script and run it on a newly-started sports DB, then check promon lock HWM. Answer: 112.
Try again with just the Invoice table: HWM 94.
Try again with "set transaction isolation level read uncommitted;" at the top of the script: unhappy sqlexp.
Code:
=== SQL Exception 1 ===
SQLState=HY000
ErrorCode=-20105
[DataDirect][OpenEdge JDBC Driver][OpenEdge] Statement not allowed in readonly isolation level (7671)
(repeated once per table)
Note: according to SQLExplorer.properties, these other tests have been done with Read Committed.
Delete all invoice records but one, run again, just on Invoice: HWM 94. So it seems the amount of data in the table isn't an issue. Maybe the issue is schema size?
Restore the DB.
Update only table statistics (Invoice): 3.
Update only index statistics (Invoice): 42. (This table has 4 indexes.)
Update only column statistics (Invoice): 324. (Huh???) (This table has 8 columns.)
Try again, column statistics (Invoice): 330. (What????) Sigh.
Update only all column statistics (Invoice): 20.

Maybe locking the table helps? Test of Invoice with "lock table pub.invoice in share mode;" at the start of the script. HWM: 94. No change.
Test of Invoice with "lock table pub.invoice in exclusive mode;" at the start of the script. HWM: 73. A bit of a change, not much.

Tried all stats types again after removing field invoice.ship-charge from the schema: no change.

This is confusing, to say the least. :confused:
 

cj_brandt

Active Member
Is the schema the same in the 2 databases you are testing ?

Is auditing enabled on both or neither ?
 
Top