Abnormal shut down. dsmCursorDelete called for invalid user. rtc: -20031

gerie

Member
Yesterday the progress database shutdown.
So I'm now trying to find the reason for this shutdown.
I'm working with progress Open Edge 10.1B and it runs on a Windows 2003 machine.
The application log tells me this. I have no clue as to what dsmCursorDelete called for invalid user. rtc: -20031 means or where it comes from. But it seems somehow related?

[2010/11/15@11:54:09.094+0100] P-5136 T-7544 I SRV 14: (9437) CTRL_LOGOFF_EVENT console event received.
[2010/11/15@11:54:09.094+0100] P-6248 T-1852 I SRV 35: (453) Logout by sysprogress on .
[2010/11/15@11:54:09.126+0100] P-6876 T-7496 I SRV 36: (9437) CTRL_LOGOFF_EVENT console event received.
[2010/11/15@11:54:09.126+0100] P-5928 T-5932 I BIW 47: (2520) Stopped.
[2010/11/15@11:54:09.126+0100] P-5204 T-5232 I SRV 4: (2520) Stopped.
[2010/11/15@11:54:09.126+0100] P-5516 T-4788 I SRV 8: (2520) Stopped.
[2010/11/15@11:54:09.126+0100] P-4984 T-4944 I SRV 23: (2520) Stopped.
[2010/11/15@11:54:09.126+0100] P-944 T-5892 I SRV 31: (2520) Stopped.
[2010/11/15@11:54:09.141+0100] P-5988 T-5992 I APW 49: (2520) Stopped.
[2010/11/15@11:54:09.141+0100] P-4660 T-5508 I SRV 16: (2520) Stopped.
[2010/11/15@11:54:09.141+0100] P-6248 T-1852 I SRV 35: (7132) dsmCursorDelete called for invalid user. rtc: -20031.
[2010/11/15@11:54:09.141+0100] P-6248 T-1852 I SRV 35: (7132) dsmCursorDelete called for invalid user. rtc: -20031.
[2010/11/15@11:54:09.141+0100] P-6248 T-1852 I SRV 35: (7132) dsmCursorDelete called for invalid user. rtc: -20031.
[2010/11/15@11:54:09.141+0100] P-6248 T-1852 I SRV 35: (7132) dsmCursorDelete called for invalid user. rtc: -20031.
[2010/11/15@11:54:09.141+0100] P-1900 T-5036 I SRV 26: (2520) Stopped.
[2010/11/15@11:54:09.141+0100] P-5088 T-5464 I SRV 18: (2520) Stopped.
[2010/11/15@11:54:09.141+0100] P-5432 T-1404 I SRV 20: (2520) Stopped.
[2010/11/15@11:54:09.157+0100] P-5348 T-1032 I SRV 2: (2520) Stopped.
[2010/11/15@11:54:09.157+0100] P-5732 T-5256 I SRV 29: (2520) Stopped.
[2010/11/15@11:54:09.157+0100] P-4804 T-2668 I SRV 7: (2520) Stopped.
[2010/11/15@11:54:09.172+0100] P-4176 T-6088 I SRV 33: (2520) Stopped.
[2010/11/15@11:54:09.172+0100] P-6248 T-1852 I SRV 35: (7132) dsmStampCheckAndGet called for invalid user. rtc: -20031.
[2010/11/15@11:54:09.172+0100] P-6016 T-3140 I SRV 11: (2520) Stopped.
[2010/11/15@11:54:09.172+0100] P-6040 T-644 I SRV 24: (2520) Stopped.
[2010/11/15@11:54:09.188+0100] P-4784 T-4320 I SRV 27: (2520) Stopped.
[2010/11/15@11:54:09.204+0100] P-3020 T-5588 I SRV 6: (2520) Stopped.
[2010/11/15@11:54:09.219+0100] P-768 T-4276 I SRV 25: (2520) Stopped.
[2010/11/15@11:54:09.219+0100] P-4456 T-892 I SRV 19: (2520) Stopped.
[2010/11/15@11:54:09.219+0100] P-4472 T-5704 I SRV 21: (2520) Stopped.
[2010/11/15@11:54:09.235+0100] P-1820 T-5424 I SRV 3: (2520) Stopped.
[2010/11/15@11:54:09.235+0100] P-2796 T-5712 I SRV 10: (2520) Stopped.
[2010/11/15@11:54:09.251+0100] P-5252 T-4104 I SRV 12: (2520) Stopped.
[2010/11/15@11:54:09.251+0100] P-5872 T-5888 I SRV 32: (2520) Stopped.
[2010/11/15@11:54:09.251+0100] P-4932 T-4952 I SRV 13: (2520) Stopped.
[2010/11/15@11:54:09.266+0100] P-3060 T-2864 I SRV 34: (2520) Stopped.
[2010/11/15@11:54:09.266+0100] P-4876 T-4964 I SRV 22: (2520) Stopped.
[2010/11/15@11:54:09.266+0100] P-5788 T-4508 I SRV 30: (2520) Stopped.
[2010/11/15@11:54:09.266+0100] P-5376 T-5240 I SRV 5: (2520) Stopped.
[2010/11/15@11:54:09.282+0100] P-4792 T-4960 I SRV 28: (2520) Stopped.
[2010/11/15@11:54:09.282+0100] P-4528 T-2008 I SRV 17: (2520) Stopped.
[2010/11/15@11:54:10.282+0100] P-640 T-3148 I SRV 15: (2520) Stopped.
[2010/11/15@11:54:10.329+0100] P-5844 T-5860 I SRV 9: (2520) Stopped.
[2010/11/15@11:54:10.438+0100] P-5400 T-5404 I BROKER 0: (2249) Begin ABNORMAL shutdown code 2
[2010/11/15@11:54:11.532+0100] P-5844 T-5312 I SQLREMC 422: (-----) Disconnected due to shutdown of primary database: D:\exactcs\prod\exactcs
[2010/11/15@11:54:12.282+0100] P-640 T-3148 I SRV 15: (7132) dsmContextGetLong called for invalid user. rtc: -20031.
[2010/11/15@11:54:12.329+0100] P-5844 T-5860 I SRV 9: (7132) dsmContextGetLong called for invalid user. rtc: -20031.
[2010/11/15@11:54:13.016+0100] P-5844 T-5312 I SQLSRV2 9: (-----) Error in rss_cleanup: error code -20034 was returned by dsmUserDisconnect
[2010/11/15@11:54:13.704+0100] P-5968 T-5972 I WDOG 48: (2520) Stopped.
[2010/11/15@11:54:14.172+0100] P-6248 T-1852 I SQLREMC 406: (-----) Disconnected due to shutdown of primary database: D:\exactcs\prod\exactcs
[2010/11/15@11:54:15.454+0100] P-5400 T-5404 I BROKER : (334) Multi-user session end.
 
You haven't shown any of the log prior to:

[2010/11/15@11:54:09.094+0100] P-5136 T-7544 I SRV 14: (9437) CTRL_LOGOFF_EVENT console event received.
[2010/11/15@11:54:09.094+0100] P-6248 T-1852 I SRV 35: (453) Logout by sysprogress on .

which is where you would potentially find clues related to "why did this happen".

The -20031 stuff happens after the shutdown started and probably doesn't tell you anything about what triggered it.
 
Here is a piece of the log from before the shutdown.

[2010/11/15@11:40:27.334+0100] P-4456 T-892 I SRV 19: (-----) Previous message sent on behalf of user 398, server pid 4456, broker pid 5400
[2010/11/15@11:45:11.004+0100] P-6876 T-5856 I SRV 36: (8873) Login usernum 397, remote SQL client.
[2010/11/15@11:45:11.004+0100] P-6876 T-5856 I SRV 36: (7129) Usr 397 set name to sysprogress.
[2010/11/15@11:45:18.957+0100] P-3060 T-2864 I SRV 34: (5408) WARNING: -l exceeded. Automatically increasing from 310 to 320.
[2010/11/15@11:45:18.957+0100] P-3060 T-2864 I SRV 34: (-----) Previous message sent on behalf of user 415, server pid 3060, broker pid 5400
[2010/11/15@11:45:39.004+0100] P-7392 T-336 I SRV 37: (8873) Login usernum 396, remote SQL client.
[2010/11/15@11:45:39.004+0100] P-7392 T-336 I SRV 37: (7129) Usr 396 set name to sysprogress.
[2010/11/15@11:49:30.377+0100] P-6876 T-5856 I SRV 36: (453) Logout by sysprogress on .
[2010/11/15@11:49:30.456+0100] P-7392 T-336 I SRV 37: (453) Logout by sysprogress on .
[2010/11/15@11:50:38.408+0100] P-2796 T-5712 I SRV 10: (739) Logout usernum 405, userid POSTHUMAT, on YRD-TS24.
[2010/11/15@11:51:02.814+0100] P-2796 T-5712 I SRV 10: (5408) WARNING: -l exceeded. Automatically increasing from 200 to 210.
[2010/11/15@11:51:02.814+0100] P-2796 T-5712 I SRV 10: (-----) Previous message sent on behalf of user 439, server pid 2796, broker pid 5400
[2010/11/15@11:51:04.392+0100] P-5136 T-7808 I SRV 14: (453) Logout by sysprogress on .
[2010/11/15@11:51:04.877+0100] P-2796 T-5712 I SRV 10: (5408) WARNING: -l exceeded. Automatically increasing from 210 to 220.
[2010/11/15@11:51:04.877+0100] P-2796 T-5712 I SRV 10: (-----) Previous message sent on behalf of user 439, server pid 2796, broker pid 5400
[2010/11/15@11:51:54.861+0100] P-2796 T-5712 I SRV 10: (5408) WARNING: -l exceeded. Automatically increasing from 220 to 230.
[2010/11/15@11:51:54.861+0100] P-2796 T-5712 I SRV 10: (-----) Previous message sent on behalf of user 439, server pid 2796, broker pid 5400
[2010/11/15@11:51:54.861+0100] P-2796 T-5712 I SRV 10: (5408) WARNING: -l exceeded. Automatically increasing from 230 to 240.
[2010/11/15@11:51:54.861+0100] P-2796 T-5712 I SRV 10: (-----) Previous message sent on behalf of user 439, server pid 2796, broker pid 5400
[2010/11/15@11:51:54.923+0100] P-2796 T-5712 I SRV 10: (5408) WARNING: -l exceeded. Automatically increasing from 240 to 250.
[2010/11/15@11:51:54.923+0100] P-2796 T-5712 I SRV 10: (-----) Previous message sent on behalf of user 439, server pid 2796, broker pid 5400
 
To be complete:

The database crashed 3 times.
The event log showed that prior to the crash a certain administrator logged out from the server.
After deleting his profile the database kept on running while he logged off.

Apperently there was somthing in his profile that killed the _sqlsvr2.exe. I just don't know what and why. But the problem seems to be resolved.
 
There is time gap in your logs 11:51:54 - 11:54:09.
However your problem is probably caused by big transaction (multiple: (5408) WARNING: -l exceeded. Automatically increasing from .... to ...). If -l have been increased over 200 000 and was still increasing then you have some big transaction which either exceeds server resources or user patience (user abnormally terminates connection which sometimes results in dead client holding latches/shared memory locks). See KB for error 5408 or "latch abnormal shutdown" (P4062 seems similar although it's 9.1C).
Also I've observed 10.1B on Linux has some user/memory locking related problems (so Windows version can have some too), you should consider upgrading to 10.1C04.
 
The time gap is caused by the restart of the server. It takes a few minutes to get everything up and running again.

Large transactions are normal for us and usually don't cause any trouble. We recently migrated to a San server, so everything is on a new enviroment.
So imo a lot of things I encounter these days shall be related to the new server enviroment. That only seems logical.
 
Back
Top