Make the crash recovery system faster

  • Thread starter Thread starter KevinRyer
  • Start date Start date
Status
Not open for further replies.
K

KevinRyer

Guest
Details: When a Progress database crashes, the database has to go through crash recovery before letting users back into the system. When this happens all incomplete transactions are backed out and the database is put back into a consistent state. The standard answer from the progress support team is the backout utility takes approximately as long to back out a transaction as it took to make it. So that if a transaction was running for 5 minutes, it will take about 5 minutes to back out. This is not normally a problem since transactions are meant to be small but there are other situations where a transaction could run for hours (or days). Example 1: A 4GL bug. Occasionally, programmers have mistakes in our code and a transaction spins into an infinite loop. This causes the database to crash when you hit the BI limit (bithold parameter). At this point, the customer is down with all users locked out of the system for in excess of an hour. Example 2: Internal Progress utilities (like Table Move). Recently a customer ran the Progress table move command on a large table (120 Gig) and then shut the system down by accident around midnight the next day. At this point the table move command had been running for around 36 hours and if it had been allowed to back out would have taken roughly 1.5 days before anyone could log back into the system. While the user powering down the system was clearly a user error and the bugs in 4GL code are clearly a programming problem, having an entire system down for 1.5 days because the user pressed a button by accident seems harsh to say the least. TL;DR: Is it possible to multi-thread the crash recovery system at database start up similar to the APWs because it takes too darn long on very powerful db servers.

Continue reading...
 
Status
Not open for further replies.
Back
Top