Forum Post: Can the different worlds hear each other?

George Potemkin · Dec 7, 2014

I'm trying to analyze my disappointment after Info Exchange at EMEA PUG Challenge 2014. We had voted for the enhancement requests. Programmers had voted for dba stuff, dba's - for the new features in language. If you did not vote for a feature because it was out of the area of your competence then it was treated as a vote against the feature. There were no possibility to advocate the suggested features. As result dba's from small shops did not support the features required for big clients. In other words, they voted against their own future. Under the "big clients" I mean the sites with the terabyte databases, thousands connections /and/ strong business requirements. How long does it take to restart a database in small shop? A few minutes. For big clients it might take an hour or more. The larger database the more chances it had the large transactions that need to be undone. The higher number of connections the more chances that some of the sessions will hang because they were interrupted in an unexpected place. And the cost of one minute of downtime is much higher for big clients. I had the cases when dba's in small shops restarted database in the middle of working hours just because I suggested them to change the values of some startup parameters. And there were the cases 'big client' decided not to restart database even in a painful situation (due to a some issue the users can't connect to or disconnect from a database) just to avoid a downtime in the middle of the day. So no matter how good is our recovery plan - we can't guarantee that a downtime will be short for big clients. We can't even predict a time of db restart in the concrete cases - we just can't see what is going on inside a database during its shutdown or during its startup. Big clients don't have any options to repair the indexes. Not a single one. I'm not kidding. Let's take, for example, 'idxfix/1. Scan records for missing index entries'. The time to run idxfix is growing as (db size)^2. More records in database the longer the idxfix will run. But there is another factor: the bigger the indexes the longer disk IO waits while jumping from the blocks in table's area to the blocks of one index then to the blocks of another index etc. For example, 'idxfix/scan records' running against 400 MB database on my laptop: Record operations: 507,533.39 (Per Sec) Index operations: 253,766.70 (Per Sec) Total o/s reads: 2,149.00 (Per Sec) The same option of idxfix running against 2 TB database on a million dollar box with Hitachi USP storage arrays: Record operations: 1,480.00 (Per Sec) Index operations: 2,959.50 (Per Sec) Total o/s reads: 163.00 (Per Sec) Is my laptop 340 times faster than the customer's "monster"? Of course, not! But it's turned out that idxfix will take more than 2 months (sic!) to scan all records in 2 TB database. Fortunately, not an year. But wait! It's just one option of idxfix. Also we need to run the option 2: Scan indexes for invalid index entries. The option 2 is slower than the option 1 because for each index the idxfix will re-read all records from a table's area to validate them. Also the data above was taken from a backup server where there were no other users running against the same database. But we, of course, would like to run idxfix in production environment where the users lock the records. Hence idxfix will wait for the locked records. Practice shows that the default value of the -lkwtmo (1800 seconds) is not enough for idxfix. Hence idxfix might process just 0.0005 locked records per sec. It's crazy but I would not surprise if these two options of idxfix together will need an year to complete in production environment for one database (not the biggest one). Provided the database will not be restarted during the year. Otherwise we will need to start from the beginning. Who would wait a year to fix the index corruptions? Even a month is not an option. We can create the multi-threaded record scan using idxfix (why it does not exist from Progress?) but not for index scan. Only developers in PSC can provide a solution for second option of idxfix. It's much faster to rebuild the indexes instead of running idxfix. It would take "just" a couple of days. But two days of db downtime is not an option for big clients - it's too costly including for their reputation. We could add a new index, activate it online and then drop the old (corrupted) index. But the new locking protocol introduced in V10.2B06 makes impossible to use idxactivate unless you use an existent unique index to build a new one. What if a table does not have a unique index? What if we need to rebuild a unique index itself? Big clients has an issue with data replication. Time to roll forward AI file on target db used to be 10% of the time to grow the AI file on source db even for the most transaction active databases. But for the terabyte databases the 20% can change to the 200-300% due to the scattered updates on source db. Note that if roll forward is persistently slower than 100% than we're in fact losing a warm spare database - the gaps between the databases will be just increasing with time. The same issue exists for OE replication but we don't have any metrics to measure how fast is a replication agent on target database. The delay in replication between the databases mainly happens due to the network problems or because a replication agent was not running on target db due to some reasons. If the delay is zero it does not mean that replication agent is idle. In other words, big clients don't have a reliable mechanism to maintain a spare database. Small shops don't have the same issues - why they will vote for enhancement requests that should solve such issues? The big clients are few. They have only one vote no matter how much they paid for their license. Does PSC have the future if it don't hear the big customers?

Continue reading...

Forum Post: Can the different worlds hear each other?

George Potemkin

Guest