Had to reboot our ancient, rusty 8.3A QAD (running on HP-UX 11i) database today due to a nasty issue relating to users who were either kicked through an automated proshut script (it kicks users after about 10 minutes of holding open a transaction, to protect the BI file from hitting 1G and stalling the DB) or cancelled and restarted their Telnet sessions: for some reason, the watchdog process was not cleaning these up and their locks and transactions persisted, preventing other users from doing stuff. We eventually cleared the tx by killing the users' unix sessions with kill -9 (after regular kill failed), which at first seemed to allow other users to process tx. But soon it was reported others were still blocked and we noticed about 100 extra users -- there were about another 100 "zombie" sessions that were not being cleaned up by the watchdog (bad watchdog! BAD!).
So, my understanding of this is that the watchdog is supposed to clean up the sessions where the user disconnects, whether the disconnect was done through proshut or whatever, but that didn't happen. What can prevent the watchdog from doing so? Is there an OS component to that (or maybe we should move up the TCP_KEEPALIVE timer to help push things along) Or is this just an 8.3A bug? Would it help to kill and restart the WDOG when this happens?
(AFAIK, this is the first time this issue has happened, in about 10 years of running the DB. We've added a lot more users lately, though.)
So, my understanding of this is that the watchdog is supposed to clean up the sessions where the user disconnects, whether the disconnect was done through proshut or whatever, but that didn't happen. What can prevent the watchdog from doing so? Is there an OS component to that (or maybe we should move up the TCP_KEEPALIVE timer to help push things along) Or is this just an 8.3A bug? Would it help to kill and restart the WDOG when this happens?
(AFAIK, this is the first time this issue has happened, in about 10 years of running the DB. We've added a lot more users lately, though.)