You could look at the Table Partitioning add-on product as an alternative approach to purging data. But my intuition says it isn't the right solution here.
Every December we delete about 200,000 records from our DB, each record takes about 3 seconds to delete via a vendor supplied deletion tool and the process runs out of hours and at weekends, this is followed up by a dump and reload/reindex on new year's eve run by the vendor.
This raises red flags for me. First, 200,000 records is very little data in the grand scheme of things. Second, deleting at a rate of 0.33 records/second is ridiculously slow. It makes me think that one or more of the following is possible:
- the hardware is woefully inadequate;
- the database is poorly configured for transaction throughput;
- the database is on some shared infrastructure (e.g. virtualization host or SAN) that is overloaded;
- the software is doing more than you think it's doing (e.g. deleting many related child records for each of these 200,000 records, or executing some complicated business logic to determine what to delete);
- the queries in the deletion tool are poorly written or poorly bracketed -- in the worst case, if there isn't an index that matches the
where
clause, it could be doing a full table scan on what is presumably a large table.
You say you have a DBA team; you're lucky! They should be able to provide a detailed picture of what this utility does, even if they don't have access to the source code. They should also understand the platform on which the database is running. In other words, they should be able to tell you which of the above bullet points are relevant here.
As a process it's clearly hugely inefficient and also very costly.
Absolutely!
You should ask yourself (or whoever is calling for this process to be done annually): what is the value of this process? As you've canned it for this year, clearly you don't see business value. Is there technical value? Someone, either internally or at the vendor, designed this process and put it into effect. What was their intent? Conserving disk space? Improving application performance? Punishing DBAs?
Some people have the mistaken notion that "if I purge data from this table then code that reads it will run faster". In fact, somewhat counter-intuitively, deleting records from a table can actually make code slower. When I encounter a purge process it generally involves, at a minimum, several GB to tens of GB of data. Even if these are very large records, 200K amounts to what, maybe a few hundred MB of data? A maintenance activity to deal with that little data seems pointless to me.
If you do purge data, you do have to do maintenance afterward to avoid the problem I described above. Your decision to go with index compaction is good. It can be done online or offline and isn't too disruptive. But a full dump and load of one table, on a continual basis, is overkill. If your data storage areas are
structured properly, you really shouldn't need to dump and load your data often; think once every several years. And when you do D&L you should have specific reasons for doing so and specific, measurable desired outcomes.
a dump and reload/reindex on new year's eve run by the vendor.
Just curious: if you have not just one DBA but a DBA team, why was the vendor doing the D&L?