G
George Potemkin
Guest
I’m thinking about the following plan what to do if we’ll get the “error in undo” again. Any comments are welcomed. 0. Closely (for example, once per second) watch db logs. If the error happens then: 1. Freeze a watchdog process (kill -SIGSTOP). It will prevent watchdog from a death during undo of death client’s transaction. Hence database will not crash immediately; 2. Optionally proquiet database. Any changes done from this point can be lost. We need a time to make a decision; 3. Get the full information about the transaction of the dead client - mainly transaction start time, the number of notes written and read for the current transaction; 4. Based on this information we can decide if we are going to switch to a warm standby database and to roll forward AI files to a point in time before beginning of the transaction or (if the transaction was opened long time ago) we can decide to continue with the current state of database even if we will be forced to use the –F option to open db. 5. If we choose to use the -F option then: 5.1 Disable a "quiet" point and disconnect all db sessions except, of course, the dead one; 5.2 Proquiet database again to write all dirty blocks on disk; 5.3 Shut database down (emergency shutdown?). Of course, the database will not be closed normally because the transaction of dead session can’t be undone due to the error; 5.4 Truncate bi -F. It’s expected that at this point of time we will lose only some changes done by the dead uncommitted transaction. The changes done by other transactions supposed to be saved on disk; 5.5 When db is up and running eliminate the changes made by dead transaction. To find out those changes we can use (with a bit of luck) AI scans. Did I miss some points?
Continue reading...
Continue reading...