Answered Over locking?

Chris Hughes

ProgressTalk.com Sponsor
Hi All

I'd appreciate your opinions please on something I don't feel is quite right......

We have a back office logistics system running on Progress with approx 80 users / system users using it at peak. We are experiencing lots of contention with locking which has lead me to check out promon. My stats for over 554 hours are

Code:
      Commits    31794654      16.0            Undos          322      0.0
Record Updates      641856      0.3    Record Reads  6837760789    3440.8
Record Creates      553839      0.3  Record Deletes      300993      0.2
    DB Writes      1315292      0.7        DB Reads      5490877      2.8
    BI Writes      183905      0.1        BI Reads        94692      0.0
    AI Writes            0      0.0
  Record Locks    144061838      72.5    Record Waits          54      0.0
  Checkpoints          80      0.0    Buffs Flushed          546      0.0

Things that I think may concern me

Locks to DB writes ratio - 1%
Why do I have more commits than writes?

I'm not expecting any answers on how to fix this, I just wonder whether there is any justification for writing an application this way!

Thanks in advance.

Chris.
 

RealHeavyDude

Well-Known Member
Commits relate to transactions ( which are operations on the logical database model ) whereas DB writes relate to database blocks ( which are part of the physical database storage model ). Therefore there is no direct relation between them. When you update several records in one transaction they might be contained in the same DB block or split across many DB blocks ...

If you want to save commits than you would need to reduce the number of transactions and increase the transaction scope - which in almost all cases is a big, big, big NO GO. In almost all cases you want your transaction to be as small as possible. But, you might want to consider to re-design programs that delete lots of records to group them in batches of 500 to 1.000 records ( or more - your milage may vary ) where feasible.

If you want to reduce the number of DB writes than there might be a potential to performance tune the database - but that largely depends on how you run your database now. In order to give an advice you would need to post lots more of information like the OS your are running on, the exact Progress/OpenEdge version + bitness, current structure of you DB, the blocksize of the DB and your database startup parameters - for a starter.


Heavy Regards, RealHeavyDude.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
Also, if you're going to look at promon stats it would be more meaningful to get stats from a short period of time at peak load or the time of lock contention. That will be more meaningful than several weeks' stats taken together.

Also post your lock table size and high water mark (promon screen 6). If you have contention issues then your application may be taking too many locks or holding locks for too long.
 

Chris Hughes

ProgressTalk.com Sponsor
Heavy - thanks for explaining the commit / write link, or not as it is. I was confident on my commit knowledge from previous coding efforts and was taking the writes as records, thats where I got confused.

The DB setup is pretty good these days 64bit 10.2B, type 2'd, indexes rebuilt, SQL stats ran - all from previous help on this forum - so I'm content the engine is good - although I've no doubt the startup params / client params may do with a tweak.

I have no access to the code - I'm an end user for these purposes.

Rob stats from promon 6, I'll show the activity from this afternoon in a later post
Code:
                  Number of database buffers (-B): 1500000
        Number of database alternate buffers (-B2): 0
          Number of before image buffers (-bibufs): 25
          Number of after image buffers (-aibufs): 25
                  Excess shared memory size (-Mxs): 66
              Before-image truncate interval (-G): 0
                          No crash protection (-i): Not enabled
        Maximum private buffers per user (-Bpmax): 64
                Current size of locking table (-L): 100000
                      Locking table entries in use: 7
                    Locking table high water mark: 8880
        Maximum number of clients per server (-Ma): 3
        Max number of JTA transactions (-maxxids): 0
                Delay of before-image flush (-Mf): 3
                  Maximum number of servers (-Mn): 50
                      Maximum number of users (-n): 121
                    Before-image file I/O (-r -R): Raw
                      Shared memory version number: 6412385
                        Number of semaphores used: 175
                                    Broker status: Executing
                                  BI Writer status: Executing
                                  AI Writer status: Not executing
                                  Watchdog status: Executing
                            Number of page writers: 2
                      Number of self-service users: 0
                            Number of remote users: 58
                                Number of servers: 48
                              Number of shut-downs: 0
                                Number of monitors: 1
                            LRU Disabled (-nolru): No
                      LRU force skips (-lruskips): 0
                    LRU2 force skips (-lru2skips): 0
              Server message wait time (-Nmsgwait): 2
    Delay first prefetch message (-prefetchDelay): Disabled
Prefetch message fill percentage (-prefetchFactor): 0
Minimum records in prefetch msg (-prefetchNumRecs): 16
Suspension queue poll priority (-prefetchPriority): 0

Thanks guys.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
Your lock table has been set much higher than the default of 8192. Typically this happens because a client blows the block table and someone asks the DBA (or whoever controls the DB startup params) to bump up -L. This is caused by lousy code with scoping issues. And with a high water mark of 8880, you have some code that's taking a lot of locks. This makes record contention issues much more likely.

Also:
  • -bibufs and -aibufs are low;
  • you're not running an AIW; is this an oversight or is AI not enabled? Either way it should be addressed.
 

Chris Hughes

ProgressTalk.com Sponsor
Hi Rob

Yes I'm suspecting code - but as I can't see the source this is why I'm asking about the likely hood, can you imagine an end user going to a software house and stating that lousy code is causing issues :eek:

I can look into reducing the -L potentially.

I thought I was happy with my bibufs, I have no waits, bytes written per second is 700 - does that justify an increase do you think? The db in question is 10GB - which I'm sure is tiny in your world.

AI AI AI, I know - on the list:oops:

Promon for a busy hour and half this afternoon

Code:
♀Activity  - Sampled at 07/29/13 16:18 for 1:24:44.
 
Event                Total  Per Sec  Event                Total  Per Sec
      Commits      229311      45.1            Undos            0      0.0
Record Updates        9279      1.8    Record Reads    30863946    6070.8
Record Creates        7997      1.6  Record Deletes        1848      0.4
    DB Writes        15272      3.0        DB Reads          58      0.0
    BI Writes        1754      0.3        BI Reads            3      0.0
    AI Writes            0      0.0
  Record Locks      497017      97.8    Record Waits            1      0.0
  Checkpoints            1      0.0    Buffs Flushed            0      0.0
 
Rec Lock Waits    0 %    BI Buf Waits      0 %    AI Buf Waits      0 %
Writes by APW    97 %    Writes by BIW    52 %    Writes by AIW    0 %
Buffer Hits    100 %    Primary Hits    100 %    Alternate Hits    0 %
DB Size        9705 MB      BI Size      64 MB      AI Size      0 K
FR chain                  2954 blocks  RM chain                    4 blocks
Shared Memory  12194M        Segments      1

Cheers!
 

Chris Hughes

ProgressTalk.com Sponsor
This is bad?

Code:
16:31:53        07/06/13 08:50 to 07/29/13 15:17 (558 hrs 27 min)
 
                                    Total        Per Min          Per Sec          Per Tx
 
Waits:
    Share                              2              0            0.00            0.00
    Intent Share                        0              0            0.00            0.00
    Exclusive                    6484694            194            3.23            0.20
    Intent Exclusive                    0              0            0.00            0.00
    Share Intent Excl                  0              0            0.00            0.00
    Upgrade                            2              0            0.00            0.00
    Record Get Lock                    0              0            0.00            0.00
    Table Lock                          0              0            0.00            0.00
    Record Lock                  6484701            194            3.23            0.20
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
Yes I'm suspecting code - but as I can't see the source this is why I'm asking about the likely hood, can you imagine an end user going to a software house and stating that lousy code is causing issues :eek:

If a user has blown the lock table in the past, you will see evidence of it in the database log. Look for such messages to narrow down who the users are. Then you can go looking for their client logs, or just talk to them, to get an idea of what they were running at the time of the error.

I can look into reducing the -L potentially.

I thought I was happy with my bibufs, I have no waits, bytes written per second is 700 - does that justify an increase do you think? The db in question is 10GB - which I'm sure is tiny in your world.

It's not so much the DB size that matters. You need well-written code to avoid record contention. When you say "no waits" do you mean record waits or empty buffer waits? If you don't have any empty BI buffer waits since the start of the DB then the value you have for -bibufs isn't hurting you right now, given your transaction rate. But that can change without notice, and bibufs aren't expensive.
 

TomBascom

Curmudgeon
The fact that you have what seems like a large numbers of locks, by itself, doesn't tell you much. With 500+ hours of stats it could be a single poorly written update program that only runs once a month. Or it could be an epidemic of poor record and lock scoping.

Smaller samples help to narrow it down.

Your shorter sample seems to show an average of 2 locks per commit (97.8 locks per second, 45.1 commits/sec). I don't think that is indicative of an epidemic of poor locking. So the big number you see in the first post is more likely from a small set of programs or even a single bad program.

Looking at "locks by user" may also help you focus on where this is coming from.
 

TomBascom

Curmudgeon
If the number of locks needed increases as the db size does (or as a particular table grows) then you probably have a case of confusion where someone has elected to implement an inappropriate business transaction using database transaction semantics. They may, for instance, have decided that some operation such as a month-end update must be "all or nothing". Rather than "continuable" and "reversible". At small volumes this is usually harmless -- but when an application scales it can become a very serious problem.
 

TomBascom

Curmudgeon
you might want to consider to re-design programs that delete lots of records to group them in batches of 500 to 1.000 records

I have found that grouping works very well for CREATE and ASSIGN (record update operations).

It works quite poorly for DELETE. DELETE operations are fastest when performed one record at a time. (As I understand it this is primarily because of the deleted entry place holders in the index blocks.)
 

TomBascom

Curmudgeon
... We are experiencing lots of contention with locking ...

Can you expand on that? Do you mean that users get messages like: "customer is in use by Rob on tty123" and have to wait? Or do you mean something else?
 

Chris Hughes

ProgressTalk.com Sponsor
If a user has blown the lock table in the past, you will see evidence of it in the database log. Look for such messages to narrow down who the users are. Then you can go looking for their client logs, or just talk to them, to get an idea of what they were running at the time of the error.

I can't see anything in the logs to suggest this. I searched for "lock".

It's not so much the DB size that matters. You need well-written code to avoid record contention. When you say "no waits" do you mean record waits or empty buffer waits? If you don't have any empty BI buffer waits since the start of the DB then the value you have for -bibufs isn't hurting you right now, given your transaction rate. But that can change without notice, and bibufs aren't expensive.


I have no empty buffer waits and only 41 busy waits over 577 hours.
I have lots of exclusive lock waits though (my last post), I suspect shared locks are over used and this is why.
I'll take your advise and increase the bibufs to 50.

Thanks
 

Chris Hughes

ProgressTalk.com Sponsor
Your shorter sample seems to show an average of 2 locks per commit (97.8 locks per second, 45.1 commits/sec). I don't think that is indicative of an epidemic of poor locking. So the big number you see in the first post is more likely from a small set of programs or even a single bad program.

Yes I didn't think about that ratio, I'm still concerned that the record creates / deletes / updates are less than commits though.

Looking at "locks by user" may also help you focus on where this is coming from.


I know where to do this live - is there an historic record available?

Thanks
 

Chris Hughes

ProgressTalk.com Sponsor
Can you expand on that? Do you mean that users get messages like: "customer is in use by Rob on tty123" and have to wait? Or do you mean something else?


Hangs I suspect - we never get any messages like your example. Could a parameter by set to suppress the message you suggest or would it have been done at code level.

Theres a background process that runs that seemingly sits and waits until it gets its lock, this clashed with a user in the GUI sitting and waiting, the user did an end task, I killed the users DB connection and the background job continued. My suspicion is shared locks were used and one or both were trying to upgrade.
 

RealHeavyDude

Well-Known Member
Please don't mind me - but SHARE-LOCK is a classic. Since Progress is backwards compatible to the bone SHARE-LOCK is still the default lock nowadays if not specified otherwise in the code. Especially when parts of your application code is very old chances are that the lock specification is missing here and there.

One option to work around it is to specifiy the -NL paramater for the client startup which will change the default lock to NO-LOCK. It could be your silver bullet but it also could have negative side effects on application behavior if an unspecified SHARE-LOCK is used deliberately in your application.

The other classic things might be transaction ( for example pessimistic locking ) and/or buffer scope in your application being larger than need be. Worst case scenario is a when the buffer scope is larger than the transaction scope and at the end of the transaction the EXCLUSIVE-LOCK is automatically downgraded to a SHARE-LOCK by the AVM regardless whether you specified -NL or not. There is nothing you can do against poorly designed ( or maybe in that case NOT designed ) transaction and/or buffer scope from the database administrator's point of view - you would need to fix the code.

Heavy Regards, RealHeavyDude.
 
Top