Rolling back an incomplete transaction

dlangschied · Apr 4, 2011

I had a run away process kill my bi. I am rolling back, but I am unsure how to determine how long it will take. This is an OE 10.1C database. Any ideas on how to figure out where the roll back is in its process?

dlangschied · Apr 4, 2011

I should point out that I am referring to disaster recovery and not ai roll back, though after an hour of waiting I am getting a little jumpy on the recovery.

RealHeavyDude · Apr 5, 2011

I am assuming you are talking about the crash recovery - that takes place in two phases:

Undo all transactions that were not committed when the "crash" occurred
Write all transactions that were committed but not yet written to the database when the "crash" occurred.

When you start the server in a console window or run a truncate BI then you will see in which phase it is.

HTH, RealHeavyDude.

TomBascom · Apr 5, 2011

Crash recovery in such a situation can take quite a while.

Basically crash recovery is going to have to reverse all of that activity and then redo a bunch more. So if it took a couple of hours for an out of control TRX to "kill your bi" then it would not be surprising to take at least that long to undo it.

This kind of thing is especially painful on poor quality disk subsystems -- like RAID5 (and its cousins RAID4 and RAID6 and RAID-DP etc).

dlangschied · Apr 7, 2011

I was looking for a way to determine progress of the crash recovery. I did not know (and am still looking into) what caused the crash, so I could not determine exactly how long I would have to wait for the recovery. I have only experienced this a few times and it was always easy to determine the culprit. In this particular case, the only thing that I have to go on is an import of production schedules into my QAD environment that had some bad data. My data validation caught the error and protected the database (or so I thought), but created problems on the OS that generated numerous e-mail alerts. I fixed the issue on the file system and all seemed good. Then 2 hours later the BI is caput. My thinking so far is that the validation process was hung up and generating the BI. However, the only thing that the process does in the database is create temp records in the database for valid records. I am rechecking the sanity of this code to see if this could have occurred. Is there anyway, post-crash recovery, that I can nail this down the culprit securely vis-a-vis the AI, or anything?

cj_brandt · Apr 7, 2011

If you look at the AI logs, you can sometimes find out helpful information. We have used it to find a large amount of updates to a couple tables - you can also find long running transactions.

Looking at the AI logs is a lot easier to track if the areas have a single table in them, there is some info in the progress kb about looking through the output of scan verbose on the ai logs.

Another option we have done is apply the ai files to a standby database to see what was occurring.

dlangschied · Jul 12, 2011

Thanks to all for their information. In this particular case, I do not think that I could have been helped by looking for a large transaction. The time necessary to do that was not available to me. I have instituted some alerts that will, hopefully, tell me that trouble is brewing before I end up with a large transaction to roll back.

Rolling back an incomplete transaction

dlangschied

Member

dlangschied

Member

RealHeavyDude

Well-Known Member

TomBascom

Curmudgeon

dlangschied

Member

cj_brandt

Active Member

dlangschied

Member