The version of OE is 10.2B out-of-the-box
Any application licensee willing to take a 24-hour outage for a fairly major maintenance activity should be more than willing to take a short outage (~5 minutes) to install an OE service pack (SP08) with literally hundreds of beneficial bug fixes (some of which will certainly reduce the risk of the major maintenance activity), not to mention several new features, some of which will
significantly reduce the required downtime of the major maintenance activity. So if your mandate is to fully optimize this activity, getting them to SP08 first is table stakes.
To date I have had limited access to this DB
See if you can get access to a similar test machine with a copy of the DB. If they don't have one, they need one. A D&L like this may require several iterations of trial and effect until you land on the appropriate approach: set up a D&L scenario, run it, log results and comments, repeat with a new scenario, etc.
Understanding the performance of each phase, and how different approaches (online versus offline, single-versus multi-threaded, etc.) is important. There are
lots of different strategies possible in each phase and we haven't covered them all here.
This machine could also be used to parallelize the work of the dump phase.
Don't know yet. I suspect that they desperately "need it".
There is also a lot of "empty" space to be reclaimed.
These quotes make me think that the DB, or at least certain large tables in it, was once much larger, so there could be primary indexes with low utilization, i.e. sparse blocks. If so, reading them to do the dumps, in their present condition, limits performance. So I would suggest compacting or rebuilding those indexes to minimize read I/O and optimize caching during the dump.
There are some minor changes to RPB planned – but they are not 'urgent'.
When you D&L, you are creating all new areas, and potentially new assignments of objects to areas, so you get any RPB changes for free, so to speak. So their urgency shouldn't be a factor. If the changes are desirable (i.e. they could improve post-D&L caching efficiency without causing excessive record fragmentation), make them a part of your approach.
The principal objective is performance
Note that it was 'allowed' to dump by the primary index and I know that the time can be shortened a lot using a table scan.
It is important to know the desired outcome because it constrains the possible approaches to dumping data. And in this case it means you must optimize not only D&L duration but also post-D&L application performance.
A table-scan (no index) dump just follows the table's cluster chain and dumps the data in the order that it is found. This means it won't reorganize the data, i.e. reduce logical scatter or record fragmentation. Similarly, a multi-threaded dump of a table won't physically organize records in order of the dump index, so it may not yield the best post-D&L performance either.
at present it is a struggle to get each day's work done in a day.
I think this requires more analysis. A D&L can often provide a performance boost, at least for some period of time, but it won't cure all ills. Things to consider:
- Do you understand the bottlenecks in the current prod environment?
- Is the I/O subsystem up to snuff?
- BI and AI tuned?
- DB broker parameters tuned appropriately? Are there parameters not currently in use that could provide benefit?
- Client startup parameters tuned? Are they TCP clients or self-service, or a mix? SQL also?
- OS resources sufficient and used to best effect?
- Do you have full CRUD stats for tables and indexes?
- Is there evidence of application issues that could be causing performance issues, that you have to work around? E.g.:
- heavy reads of a small control table (possible LRU latch contention)
- possible solutions:
- assign table and index(es) to Alternate Buffer Pool
- add -lruskips 100 (requires 10.2B06+)
- extremely heavy reads of a larger table (possibly badly-written or poorly-bracketed queries, wrong USE-INDEX, missing index)
- possible solutions (likely requires app vendor involvement):
- rewrite the queries
- add index(es) to improve bracketing
- majority of reads of a large, heavily-read table use the non-primary index
- possible solution: alter your dump strategy to dump via the most-used index
- overnight read-only reports that do heavy reads and pollute the buffer pool, impacting application client performance the next day until the cache warms up again
- possible solution: if possible, have the client use private read-only buffers, either programmatically or via startup parameter; won't make the reporting client any faster but will benefit other users of the buffer pool
- Do you have metrics from promon/VSTs that could indicate bottlenecks to be addressed
- e.g. empty BI buffer waits, buffers flushed at checkpoint, low logical-to-physical read ratio, high latch naps, etc.
- do you see temporal patterns in the data? (not just what the problems are but when they tend to occur)
- Are there missing background processes that are impacting performance?
- BIW, AIW, not enough APWs
- Other resource contention issues
- other workload on shared LUNs on a SAN
- are you getting all the I/O bandwidth you expect?
- other workload on the database server
- are there database clients or ancillary applications that can run on another server instead?
- other workload on the physical host
- if this is a VM, could the physical host be overprovisioned, misconfigured (e.g. wrong vNIC driver or memory config), or sharing resources for work on other VMs?
Obviously, most of these things won't be addressed with a D&L. But reading between the lines, it seems the
real work here is application performance optimization and the D&L is perceived as the means to that end. Certainly it can help if done well but the larger problem should be addressed holistically to achieve lasting and maximum benefit.