Rob Fitzpatrick
ProgressTalk.com Sponsor
I'm preparing to run Update Statistics in a client's prod environment (AIX 7.1, OE 11.6.0). First I ran it in a test DB on a different system (SLES 11.1, OE 11.6.3) and didn't have any errors. The test DB was created via D&L, not prorest. This is the code I run to create a SQL script:
On the internal SLES test box, where the SQL script succeeded, this is an excerpt from the log:
Note the (17741) message near the beginning. It was run via sqlexp with just the required settings. I didn't change the session's transaction isolation level.
Next we ran it on the client's test server (AIX 7.1, OE 11.6.0), on a copy (via prorest) of the prod DB. The log looked like this:
Note the absence of a (17741) message. I don't know the significance of this; I don't see anything in the SP01-SP03 release notes related to Update Statistics and locking. This script was run the same way, with the same settings, as the first run which succeeded.
This was done after hours and there shouldn't have been anyone else logged in and taking lots of locks (I haven't yet proven this conclusively). But I doubt the LTO was caused by other (ABL) client activity. This is a secondary application DB where there are few locks taken generally, and it's a test environment. Also, the Update Statistics was later run against the primary application database and it also threw LTO errors.
What's also odd is that there seems to be no correlation between table sizes and the overflows. These are the tables whose updates were current during the lock table overflows, and their record counts and sizes in bytes (from dbanalys):
Finally, from the docs on Update Statistics:
"Obtaining statistics does not require an exclusive lock on the
schema or any table locks. Rows written to statistics tables will be exclusively locked, as in
every transaction performing updates. Therefore, statistics can be obtained while normal
database operations continue."
Questions:
Code:
/* Generate a SQL UPDATE STATISTICS script to be run from sqlexplorer.
From KB article P115266 (S000011034)
https://knowledgebase.progress.com/articles/Article/P115266/p
To run the output of this program:
sqlexp -db <dbname> -S <SQL broker port>
-infile UpdateStatistics_<dbname>.sql
-outfile UpdateStatistics.out
-user <sql username> -password <password>
where <username> is a DB user with SQL access
*/
define variable v-tbl-has-recs as logical no-undo.
define variable bh-table-buffer as handle no-undo.
output to value( "updatestatistics_" + ldbname( "dictdb" ) + ".sql" ).
for each dictdb._file no-lock where _file._file-number > 0 and _file._file-number < 32768:
create buffer bh-table-buffer for table _file._file-name.
v-tbl-has-recs = bh-table-buffer:find-first( "", no-lock ) no-error.
if v-tbl-has-recs then /* only update the statistics if the table contains data */
do:
put unformatted "update table statistics and index statistics and all column statistics for PUB." quoter( _file._file-name )";" skip.
put unformatted "commit work;" skip.
end.
delete object bh-table-buffer.
end.
output close.
On the internal SLES test box, where the SQL script succeeded, this is an excerpt from the log:
Code:
SRV 3: (8873) Login usernum 66, remote SQL client.
SRV 3: (7129) Usr 66 set name to sysprogress.
SQLSRV2 3: (-----) Login by user 66. Using TCP/IP IPV4 address 127.0.0.1
SRV 3: (17741) Enabling table locks.
SQLSRV2 3: (10996) %Database statistics (table,index,column) updated for table PUB.app-asset-liab.
SQLSRV2 3: (10996) %Database statistics (table,index,column) updated for table PUB.app-cif.
... (other table messages omitted for brevity)
SQLSRV2 3: (10996) %Database statistics (table,index,column) updated for table PUB.xml-schema.
SQLSRV2 3: (10996) %Database statistics (table,index,column) updated for table PUB.xml-setup.
SQLSRV2 3: (10996) %Database statistics (table,index,column) updated for table PUB.xml-transaction.
SQLSRV2 3: (-----) Logout by user 66.
SRV 3: (453) Logout by sysprogress on .
Note the (17741) message near the beginning. It was run via sqlexp with just the required settings. I didn't change the session's transaction isolation level.
Next we ran it on the client's test server (AIX 7.1, OE 11.6.0), on a copy (via prorest) of the prod DB. The log looked like this:
Code:
SQLSRV2 14: (-----) Login by user 79. Using TCP/IP IPV4 address 16.86.186.224
SQLSRV2 14: (10996) Database statistics (table,index,column) updated for table PUB.app-asset-liab.
SRV 14: (915) Lock table overflow, increase -L on server
SQLSRV2 14: (10996) Database statistics (table,index,column) updated for table PUB.app-cif-addr.
...
SQLSRV2 14: (10996) Database statistics (table,index,column) updated for table PUB.app-security-bldg.
SRV 14: (915) Lock table overflow, increase -L on server
SQLSRV2 14: (10996) Database statistics (table,index,column) updated for table PUB.company-interface.
...
SQLSRV2 14: (10996) Database statistics (table,index,column) updated for table PUB.ISO-Setup.
SRV 14: (915) Lock table overflow, increase -L on server
SQLSRV2 14: (10996) Database statistics (table,index,column) updated for table PUB.loan-app-detail.
...
SQLSRV2 14: (10996) Database statistics (table,index,column) updated for table PUB.sys-ctrl.
SRV 14: (915) Lock table overflow, increase -L on server
SQLSRV2 14: (10996) Database statistics (table,index,column) updated for table PUB.trans-log.
SQLSRV2 14: (10996) Database statistics (table,index,column) updated for table PUB.xml-map.
Note the absence of a (17741) message. I don't know the significance of this; I don't see anything in the SP01-SP03 release notes related to Update Statistics and locking. This script was run the same way, with the same settings, as the first run which succeeded.
This was done after hours and there shouldn't have been anyone else logged in and taking lots of locks (I haven't yet proven this conclusively). But I doubt the LTO was caused by other (ABL) client activity. This is a secondary application DB where there are few locks taken generally, and it's a test environment. Also, the Update Statistics was later run against the primary application database and it also threw LTO errors.
What's also odd is that there seems to be no correlation between table sizes and the overflows. These are the tables whose updates were current during the lock table overflows, and their record counts and sizes in bytes (from dbanalys):
Code:
table records size (b)
app-asset-liab 4,026 400,486
app-security-bldg 190,365 13,107,200
ISO-Setup 8 361
sys-ctrl 16 639
Finally, from the docs on Update Statistics:
"Obtaining statistics does not require an exclusive lock on the
schema or any table locks. Rows written to statistics tables will be exclusively locked, as in
every transaction performing updates. Therefore, statistics can be obtained while normal
database operations continue."
Questions:
- What is the significance, if any, of the 17441 message (11.6.3 on Linux)?
- Why would the Update Statistics cause LTO erorrs when it did, especially when there are other tables in the same DB with millions of records and GB of data, where it didn't overflow the lock table?
- Apart from the records being written in the handful of SQL_STAT tables, what should be locked? Seems like nothing, according to the docs.
- Has anyone else experienced such issues during a mass Update Statistics?