DB abnormal shutdown due to appserver terminated abnormally

saurabhexp

New Member
In recent couple of weeks, DB gets shutdown because watchdog removes the dead user holding buffer lock. Below is the log entry of AppServer broker. This happens just before DB goes down.

How to troubleshoot this issue, why this is happening across multiple servers for different sites?


[22/10/25@20:45:00.718-0700] P-012216 T-C-0012 2 UB Basic The ASK major version in the CONNECT request is 0. AppServer Keepalive protocol is disabled for this connection. (13765)
[22/10/25@20:45:22.451-0700] P-012216 T-S-0157 1 UB ----------- ServerSocketsIPC read() IOException : java.net.SocketException: Connection reset : Connection reset (8127)
[22/10/25@20:45:22.451-0700] P-012216 T-S-0157 1 UB ----------- ServerIPCException in getServerIPCMsg() : ServerIPC error: ubServerSocketsIPC.read() IOException: Connection reset. (8119)
[22/10/25@20:45:22.463-0700] P-012216 T-S-0157 1 UB ----------- Posted EAbnormalShutdownServerEvent for PID: 21780
[22/10/25@20:45:22.464-0700] P-012216 T-C-0012 1 UB ----------- Server Terminated Abnormally (8420)
[22/10/25@20:45:22.465-0700] P-012216 T-S-0158 2 UB Basic Started server thread: S-0158. (8101)
 

saurabhexp

New Member
Progress Version 11.6.3.011
OS - Windows Server 2012 R2 Standard

DB log entries
[2022/10/23@00:30:00.394-0400] P-25360 T-23076 I APPSRV142: (7129) Usr 142 set name to sysprogress.
[2022/10/23@00:30:00.455-0400] P-22884 T-23144 I APPSRV133: (7129) Usr 133 set name to sysprogress.
[2022/10/23@00:30:03.745-0400] P-36836 T-20580 I APPSRV138: (7129) Usr 138 set name to sysprogress.

[2022/10/23@00:30:43.762-0400] P-6200 T-6204 I WDOG 129: (2523) User 133 died with 1 buffers locked.
[2022/10/23@00:30:43.766-0400] P-6200 T-6204 I WDOG 129: (2527) Disconnecting dead user 133.
[2022/10/23@00:30:43.778-0400] P-4212 T-5516 I AIW 128: (2520) Stopped.

[2022/10/23@00:30:45.044-0400] P-6200 T-6204 I : (453) Logout by SYSTEM on batch.
[2022/10/23@00:30:45.575-0400] P-7220 T-8628 I SQLREMC 932: (-----) Disconnected due to shutdown of primary database: D:\XX\XX\wrk\db\docutap
[2022/10/23@00:30:45.576-0400] P-7220 T-8628 I SRV 4: (453) Logout by SYSTEM on CON:.
[2022/10/23@00:30:45.737-0400] P-6976 T-11716 I SQLREMC 936: (-----) Disconnected due to shutdown of primary database: D:\XX\XX\wrk\db\docutap
[2022/10/23@00:30:45.738-0400] P-6976 T-11716 I SRV 2: (453) Logout by SYSTEM on CON:.
[2022/10/23@00:30:46.174-0400] P-560 T-584 I BROKER 0: (15192) The database will complete shutdown within approximately 60 seconds.
[2022/10/23@00:30:46.174-0400] P-560 T-584 I BROKER 0: (2249) Begin ABNORMAL shutdown code 2
[2022/10/23@00:30:46.186-0400] P-7064 T-5512 I SQLREMC 933: (-----) Disconnected due to shutdown of primary database: D:\XX\XX\wrk\db\docutap
[2022/10/23@00:30:46.187-0400] P-7064 T-5512 I SRV 3: (453) Logout by SYSTEM on CON:.
[2022/10/23@00:30:47.187-0400] P-560 T-584 I BROKER 0: (2263) Resending shutdown request to 10 user(s).
[2022/10/23@00:30:48.844-0400] P-30248 T-30984 I APPSRV140: (453) Logout by sysprogress on batch.
[2022/10/23@00:30:55.233-0400] P-36836 T-20580 I APPSRV138: (453) Logout by sysprogress on batch.


App Server Broker log entries
[22/10/23@00:30:00.697-0400] P-010832 T-C-0003 2 UB Basic Client disconnected : 10.80.46.155. (8534)
[22/10/23@00:30:07.667-0400] P-010832 T-S-0176 1 UB ----------- ServerSocketsIPC read() IOException : java.net.SocketException: Connection reset : Connection reset (8127)
[22/10/23@00:30:07.668-0400] P-010832 T-S-0176 1 UB ----------- ServerIPCException in getServerIPCMsg() : ServerIPC error: ubServerSocketsIPC.read() IOException: Connection reset. (8119)
[22/10/23@00:30:07.676-0400] P-010832 T-S-0176 1 UB ----------- Posted EAbnormalShutdownServerEvent for PID: 22884
[22/10/23@00:30:07.677-0400] P-010832 T-C-0008 1 UB ----------- Server Terminated Abnormally (8420)
[22/10/23@00:30:07.688-0400] P-010832 T-C-0008 2 UB Basic Client disconnected : 10.80.46.155. (8534)
 

TomBascom

Curmudgeon
When WDOG tells you that a process dies with buffers locked the "buffer" is a block in -B, not a record buffer (just to get that confusion out of the way). This _usually_ happens because someone killed the session (usr 133 in this case). On UNIX that is often a "kill -9". Terminating a connection in such a way does not permit the session to clean up and if it was holding a latch (aka a "buffer lock") nobody else will be able to get that latch since the dead process cannot ever release it and the information needed to undo whatever it was doing was in the local memory of the now-dead process. Sooner or later anyone trying to get any work done is going to queue for that latch and, eventually, everyone will be blocked waiting for the lock to be released. ("Eventually" sounds like a long time and from the CPU perspective it could be, but it may not be very long in human time.) So WDOG notices the problem and says, "crap, I can't fix this, we need to shutdown".

You're running Windows so it wasn't "kill -9" but there are plenty of equivalents:
  • when you terminate a process with task manager
  • "x"'ing a window that is running a shared memory connection on the console ("because it is faster" (which it is)), but this is an appserver so that isn't what happened in this case
  • logging out - yes the mere act of logging out of Windows results in Windows "very helpfully" terminating all processes running with that same userid - including background processes and scheduled tasks
  • bugs
In this case, if the time stamps are correct, the root cause is whatever happened to trigger the error 8127 in the appserver broker log. On the surface it looks like something external reset the connection. Which, I guess, is stating the obvious. It would, of course, be helpful to upgrade to a supported release, there have been lots of bug fixes and improvements since 11.6 and Windows Server 2012 that might address whatever is happening here. If you have some sort of idea what is triggering that error you could also try addressing it from that side.

I would also look for protrace files (possibly "protrace.010832"). If a protrace was generated that might provide an important clue regarding what is going wrong.

You might also increase the logging level in the appservers. Maybe it would help to know what requests the appserver was processing when this error occurs. If there is something consistent happening there that might lead somewhere helpful.
 

dimitri.p

Member
The most equivalnet to kill -9 on windows is taskkill /IM exenamehere /F ( taskkill /? to find out all the interesting ways to wreck havoc with it ).

Since this started happening at multiple sites, event logs on the client and the server should be looked at as well , to see what the OS is reporting, if anything, 3 minutes before and three minutes after the database log time.
 
Top