Again Progress 7 crashes with Lock table overflow, increase -L on server (915)

progressproger

New Member
Hi.

I work with the Progress 7.3D on HP-UX B.10.20 A 9000/785 and, therefore, cannot use anything like _Lock, _UserLock, etc.

A couple of weeks ago I had the test environment DB replaced by that from the production. Since there, not much changes were made to the test base. But several days ago the nightly scheduled activity started crashing with "Lock table overflow, increase -L on server (915)". On production, everything is working well. The source code is rather old (about 10 years old) and had been written before I was employed here.

The script that is crashing generally looks like

Code:
   def buffer be_CustomerBills for CustomerBills.
   def buffer be_CustomerServiceChanges for CustomerServiceChanges.
   
   for each CustomerBills (a few dozens of thousands) no-lock:
   
      some minor read-only activity and 1 record is created in another table
   
      TRANLABEL:
      do transaction:
         some minor read-only activity

         for each CustomerServiceChanges no-lock where (correspond to a current CustomerBills and are unprocessed):
             find first be_CustomerServiceChanges exclusive-lock where  recid(be_CustomerServiceChanges) = recid(CustomerServiceChanges) no-wait  no-error.
             if could not get lock on be_CustomerServiceChanges then
                undo TRANLABEL, leave TRANLABEL.
   
             some rather complex activity with the modification of about 20 related  tables - not more than few dozens of locked records
   
         end. /* for each CustomerServiceChanges */

         if CustomerBills needs to be changed then
         do:
             find first be_CustomerBills exclusive-lock where  recid(be_CustomerBills) = recid(CustomerBills) no-wait  no-error.
             if could not get lock on be_CustomerBills then
                undo TRANLABEL, leave TRANLABEL.
             else
                modify 2 fields of be_CustomerBills and release be_CustomerBills
          end.  
      end. /* TRANLABEL */
   
      delete the record in another table created at the beginning of for each CustomerBills
  end. /* for each CustomerBills */


The crash happens on the very first CustomerBills being processed on one of its CustomerServiceChanges records.

Under the rather complex activity I mean procedures that are used in hundreds of places across the whole big application, and this is why the TRANLABEL transaction is so wide - it's better to undo the big TRANLABEL transaction than cleaning anything out manually from all the about 20 tables.

I don't expect more than a few dozens of records to be locked within the TRANLABEL transaction block at most, but the promon shows about 1700 simultaneous locks kept by the user I execute the script on behalf of.

The -L parameter is now 12000 (when this started a few days ago, it was 10000) which is much more than 1700, but the crash does happen.

I did not increase the -L value more this time before I figure out this strange behavior. And this is also something to take long, because I don't have permissions to do this and getting this done by admins takes long to go thru several approvals chain.

When such problem happened before in different places, increasing helped for just a few months, and then I had to increase it again and again. Before I go ahead and increase again, I would like to figure out more than I used to when increasing -L in the past, because increasing helps very temporarily, and also I don't want to give -L too large value in order not to decrease the overall performance which is great now.

I also played with the transaction scope - I removed the transaction work from the TRANLABEL's do statement and put only actual changes of be_CustomerServiceChanges and be_CustomerBills into separate transactions, but it always crashed on attempting to get an exclusive lock either on be_CustomerServiceChanges or on be_CustomerBills.

In other words, changing transaction scope did not help at all.

What is strange also - every next script execution processes one be_CustomerServiceChanges record more before the crash. in other words, if now it crashed on some particular be_CustomerServiceChanges record, next time it will process this record and will crash on the next be_CustomerServiceChanges record.

Any ideas on what should I do? I don't have a budget to rewrite what's described above as "some rather complex activity with the modification of about 20 related tables - not more than few dozens of locked records".

I am going to request the DB rebuild via creating a new empty DB and copying everything over there, but the approval will take me some while and I cannot wait.

Thanks a lot for ideas.
 
First thing I would do is to move the transaction scope to inside the for each CustomerServiceChanges no-lock
 
The transaction keyword in the "do transaction" isn't doing anything for you. The transaction is scoped to the outer-most iterating block with a database update, which is your "for each CustomerBills" block. And that assumes you aren't already in a transaction when the provided code runs.
 
Ah yes I didn't spot that bit - hard to read without proper indentation and code tags :)
 
You do realise that Progress 7 is about as ancient as dinosaurs? You really need to upgrade to a later release.
 
First thing I would do is to move the transaction scope to inside the for each CustomerServiceChanges no-lock

Although changes within for each CustomerServiceChanges no-lock and out of it need to be done within a transaction, I tried but everything remains the same.
 
The transaction keyword in the "do transaction" isn't doing anything for you. The transaction is scoped to the outer-most iterating block with a database update, which is your "for each CustomerBills" block. And that assumes you aren't already in a transaction when the provided code runs.

I just tried to remove the TRANSLOOP block and added the transaction word to for each CustomerBills and when I need to undo a transaction, I undo this new one.

Nothing has changed at all and the behavior is exactly the same no matter what are transaction scopes. LOL :)
 
Ah yes I didn't spot that bit - hard to read without proper indentation and code tags :)

Oh, sorry, initially I did not find it in the message editor, but when I specified the CODE tag manually, it looks much better :)

You do realise that Progress 7 is about as ancient as dinosaurs? You really need to upgrade to a later release.

I do realize, but this is not a kind of a decision I am into :)
 
As a summary, I tried a lot of combinations with transaction scoping and nothing has changed at all.

I will try to figure out if this is a DB file corruption, but if there is no corruption, I have no idea of what to do with this.

Recently I had same problem with another script which is very simple and consists of 3 identical no-lock loops in which the record of the related table gets found and basing on it, a couple of the loop's table fields are updated. It crashed in the same way just while looping even without having to update (this is why I suspect the DB corruption). When I introduced a flag saying if an update needed and checked it at the loop end and made changes if necessary (with locking exclusively) within a transaction, the problem went away.

With the currently crashing script, things are much more complex.

What I can't understand - it used to work well, but some night it started crashing, taking into account that no billing changes were made (this is a test box that is unused when no release needs to be tested).
 
  1. Do you have a test environment, with equivalent code and data, where you can reproduce this issue?
  2. What is your current setting for -L?
  3. Based on your current code and the data in the relevant tables, how many concurrent locks do you expect the code to use?
  4. Based on the current code (and the code that runs it), what do you believe your transaction scope to be?
  5. The lock table is a shared resource on the database. Is there any other code running concurrently that could be taking large numbers of locks?
 
The first thing I would do is to add:
Code:
message transaction.
pause.
to the top of the program to verify that there is not a transaction inherited from whatever calls this program (or a TRX scoped to the procedure block).

I would also compile with XREF to see if any of the queries are WHOLE-INDEX. If they are, and if they are not NO-LOCK, then that is likely where your lock table growth is coming from.

Lastly -- the behavior of -L increasing by 1 after a session crashes is a bug. Eventually it will eat up all of your -Mxs and the db will crash.
 
Back
Top