User xx died holding the Audit Policy Latch

Vaalen

Member
Hi,

Sometimes (I think at least once every 14 days) the first user of the day (we restart the databases every night) hangs while connecting to one of the databases. When this happens, nobody is able to connect to the database anymore.

A lot of batch processing has been done between starting up the database and the first user trying to connect. Every Login matches a Logout in the .lg file.

Auditing is enabled.

We cannot find any error anywhere.
When we kill (Unix, RedHat) the user process (taking up around 100% CPU), everything is working perfectly and everybody can connect and start the application.

We once found an error (see title), but we are unsure if this is because we killed the process. Anyway, the user mentioned was indeed the first one trying to connect.

We use OpenEdge 10.2A on Unix RedHat. We use putty from our pc to start a unix session.

Any suggestion is welcome...
 

cj_brandt

Active Member
What version of Progress OpenEdge are you using ? We have an issue with a latch related to auditing being left open and 10.2B06 may help. We can't reproduce the issue, so we can't confirm whether or not its fixed...
 

Vaalen

Member
Thanks cj_brandt.

As mentioned, we use OpenEdge 10.2A. The problem must be related to our using auditing.

I cannot find any knowledge base entry about this issue being solved....
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
If you have a self-service client you want to terminate, you should first disconnect it from the database(s) with proshut or promon. Then if it doesn't die you can safely kill it with a Unix kill, without impacting your database. Killing a database user from Unix can cause an abnormal shutdown of your database.
 

Vaalen

Member
Thank you Rob.

We received this message (14 days ago) after issuing a shutdown.

This morning the process was killed.

But this does not resolve our issue. The process hangs and nobody is able to connect to the database.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
When you say "the process hangs", do you mean a _progres process? What messages do you see in the database log when a client tries to connect?
 

Vaalen

Member
Code:
[2012/05/22@04:23:03.808+0200] P-8691       T--1022276720 I SRV     1: (8873)  Login usernum 250, remote SQL client. 
[2012/05/22@04:23:03.819+0200] P-8691       T--1022276720 I SRV     1: (7129)  Usr 250 set name to root. 
[2012/05/22@04:24:09.016+0200] P-8691       T--1022276720 I SRV     1: (453)   Logout by root on  . 
[2012/05/22@04:27:36.632+0200] P-8691       T--1022276720 I SRV     1: (8873)  Login usernum 250, remote SQL client. 
[2012/05/22@04:27:36.634+0200] P-8691       T--1022276720 I SRV     1: (7129)  Usr 250 set name to root. 
[2012/05/22@04:29:39.030+0200] P-8691       T--1022276720 I SRV     1: (453)   Logout by root on  . 
[2012/05/22@04:30:38.726+0200] P-8691       T--1022276720 I SRV     1: (8873)  Login usernum 250, remote SQL client. 
[2012/05/22@04:30:38.728+0200] P-8691       T--1022276720 I SRV     1: (7129)  Usr 250 set name to root. 
[2012/05/22@04:36:09.047+0200] P-8691       T--1022276720 I SRV     1: (453)   Logout by root on  . 
[2012/05/22@05:00:03.006+0200] P-10721      T-0     I ABL    57: (452)   Login by root on batch. 
[2012/05/22@05:00:03.616+0200] P-10721      T-0     I ABL    57: (708)   Userid is now Login. 
[2012/05/22@05:00:03.998+0200] P-10721      T-0     I ABL    57: (12699) Database xlogic Options:  
[2012/05/22@05:46:27.329+0200] P-10721      T-0     I ABL    57: (453)   Logout by Login on batch. 
[2012/05/22@06:41:06.065+0200] P-13923      T-0     I ABL    57: (452)   Login by edwinz on /dev/pts/0. 
[2012/05/22@06:41:15.862+0200] P-13923      T-0     I ABL    57: (562)   HANGUP signal received. 
[2012/05/22@06:41:22.731+0200] P-14193      T-0     I ABL    58: (452)   Login by edwinz on /dev/pts/1. 
[2012/05/22@06:46:14.479+0200] P-14193      T-0     I ABL    58: (562)   HANGUP signal received. 
[2012/05/22@06:46:24.383+0200] P-14494      T-0     I ABL    59: (452)   Login by edwinz on /dev/pts/2. 
[2012/05/22@06:48:24.932+0200] P-14494      T-0     I ABL    59: (562)   HANGUP signal received. 
[2012/05/22@06:52:34.022+0200] P-14806      T-0     I ABL    60: (452)   Login by edwinz on /dev/pts/3. 
[2012/05/22@06:56:15.822+0200] P-15095      T-0     I ABL    61: (452)   Login by edwinz on /dev/pts/4. 
[2012/05/22@07:01:02.174+0200] P-15165      T-0     I ABL    62: (452)   Login by root on batch. 
[2012/05/22@07:02:45.689+0200] P-15433      T-0     I ABL    63: (452)   Login by jos on /dev/pts/5. 
[2012/05/22@07:03:01.782+0200] P-15462      T-0     I ABL    64: (452)   Login by root on batch. 
[2012/05/22@07:03:05.469+0200] P-15095      T-0     I ABL    61: (562)   HANGUP signal received.

Code:
[2012/05/22@06:41:06.065+0200] P-13923      T-0     I ABL    57: (452)   Login by edwinz on /dev/pts/0. 
[2012/05/22@06:41:15.862+0200] P-13923      T-0     I ABL    57: (562)   HANGUP signal received.

User edwinz was the first user this morning. Because nothing happened he probably used the X-button of the putty screen.
But process 13923 was still running on unix. When we came in around 8.30 this morning, this process was still active and using 100% CPU.
 

Vaalen

Member
Two weeks ago (same situation, different first user) we did a nice proshut.

Portion of this lg:

Code:
[2012/04/10@07:53:55.921+0200] P-8690       T-0     I ABL   123: (452)   Login by john on /dev/pts/37. 
[2012/04/10@07:54:45.589+0200] P-8916       T-0     I SHUT  124: (542)   Server shutdown started by root on batch. 
[2012/04/10@07:55:00.846+0200] P-7332       T-0     I ABL   118: (562)   HANGUP signal received. 
[2012/04/10@07:55:28.094+0200] P-8470       T-0     I ABL   122: (562)   HANGUP signal received. 
[2012/04/10@08:03:49.799+0200] P-8690       T-0     I ABL   123: (562)   HANGUP signal received. 
[2012/04/10@08:06:28.962+0200] P-21461      T-0     I WDOG   51: (-----) User 57 died holding the Audit Policy latch
[2012/04/10@08:06:29.618+0200] P-8132       T-0     I ABL   120: (12524) Internal error occurred in cacheLoad, errno 3, ret -20035. 
[2012/04/10@08:06:29.618+0200] P-8132       T-0     I ABL   120: (453)   Logout by hannieh on /dev/pts/34. 
[2012/04/10@08:06:29.661+0200] P-4395       T-0     I ABL   113: (12524) Internal error occurred in cacheLoad, errno 3, ret -20035. 
[2012/04/10@08:06:29.661+0200] P-4395       T-0     I ABL   113: (453)   Logout by heddy on /dev/pts/30. 
[2012/04/10@08:06:29.686+0200] P-31239      T-0     I ABL    65: (12524) Internal error occurred in cacheLoad, errno 3, ret -20035. 
[2012/04/10@08:06:29.686+0200] P-31239      T-0     I ABL    65: (453)   Logout by wolfgk on /dev/pts/7. 
[2012/04/10@08:06:29.857+0200] P-6261       T-0     I ABL   116: (12524) Internal error occurred in cacheLoad, errno 3, ret -20035. 
[2012/04/10@08:06:29.857+0200] P-6261       T-0     I ABL   116: (453)   Logout by ilseb on /dev/pts/33.
 

RealHeavyDude

Well-Known Member
To me this looks like a bug - therefore, if I were you, I would contact Progress tech support.

Heavy Regards, RealHeavyDude.
 
Top