Rob Fitzpatrick
ProgressTalk.com Sponsor
Back end: AIX 7.1
RDBMS: 10.2B02 Enterprise
Brokers: 4GL (primary), SQL
I had a database crash recently. I'm confused by what I see in the DB log. (Not the first time, not the last.)
Here is the relevant bit from the log:
The dead server in question was a SQL server (PID 13107340). According to the log the last thing it did before it died was process a login from a SQL client (Cyberquery). Presumably, it also ran a select. This was six seconds before the 1153 error. From what I understand, the 1153 error by itself should not be fatal, unless there was some unrecoverable situation, like the server held a latch.
So, questions:
RDBMS: 10.2B02 Enterprise
Brokers: 4GL (primary), SQL
I had a database crash recently. I'm confused by what I see in the DB log. (Not the first time, not the last.)
Here is the relevant bit from the log:
Code:
SRV 3: (8873) Login usernum 99, remote SQL client.
SRV 3: (7129) Usr 99 set name to <username>.
BROKER 1: (1153) BROKER detects death of server 13107340.
BROKER 1: (8839) No SQL servers are available. Try again later.
BROKER 0: (2526) Disconnecting client 92 of dead server 3.
APW 34: (453) Logout by root on /dev/pts/0.
BROKER 0: (5028) SYSTEM ERROR: Releasing regular latch. latchId: 22
BROKER 0: (2522) User 92 died holding 1 shared memory locks.
BIW 32: (2520) Stopped.
AIW 30: (2520) Stopped.
...
other servers and process stop, and local clients are signalled by the broker...
then the broker begins an abnormal shutdown
The dead server in question was a SQL server (PID 13107340). According to the log the last thing it did before it died was process a login from a SQL client (Cyberquery). Presumably, it also ran a select. This was six seconds before the 1153 error. From what I understand, the 1153 error by itself should not be fatal, unless there was some unrecoverable situation, like the server held a latch.
So, questions:
- Are latch IDs always the same, from one startup to the next or one DB to the next (within a given version)? In other words, does latchId 22 mean "the _Latch record with _Latch-Id = 22", i.e. MTL_CPQ (checkpoint queue latch)?
- If so, does this help me diagnose the crash in any way? My guess is "no".
- How is it that user 92, a remote SQL client, held a shared memory lock? Does this actually mean that server 3 held the lock while serving a request from user 92?
- Has anyone run into this kind of issue in the past?