Question Scan a table between recids.

ron

Member
10.2B. Enterprise on Linux.

Hi ...

I have done many dumps/reloads in the past -- always using a fully-scripted binary dump/load method. All went very well.

I am in a new environment where the DB is big (800 GB) and hasn't been "straightened-out" for a very long time. I am experimenting with ways to do a D/L in a "reasonable" time-frame. On a test system where the only "smart" aspect is splitting-up the dumping into 15 parallel streams -- the job end-to-end takes 5 full days.

As you might expect -- I have a group of 'big' tables that each take many hours to dump -- one takes 16 hrs, one 13 hrs and another two take 5 hrs each. I am going to try tackling the big tables by dumping them in sections with 4GL code.

My question is: is there a way to code a FOR EACH loop to begin and stop at particular recids? I appreciate that I can use unique index keys to bracket what I'm after -- but just wondered if it could be done with recids.

Ron.
 

TomBascom

Curmudgeon
Yes. It is possible.

(It’s late, and I am jammed tomorrow so it might be a day or so before I can expand on that...)
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
  • What is the service pack of 10.2B? It is relevant when discussing D&L.
  • Can you describe your process in more detail?
  • How much time does it take for each phase (dump, load, rebuild)? In other words, which parts require the most tuning?
  • How long has it been since the source primary indexes have been compacted or rebuilt? Do they need it?
  • What is the current structure? Are the tables to be dumped in Type II storage already?
  • How are you dumping, single- or multi-threaded? DB online or offline? Using -RO? Table scan option (-index 0)?
  • Do you have the option of speeding the process by dumping simultaneously on several machines with separate storage?
  • Do you have any specific outcomes that are objectives of this dump and load, apart from optimizing its duration?
 

TomBascom

Curmudgeon
Rob's questions are great too.

To elaborate on my original answer:
Code:
/* recid.p
*/


/* this part is just for giggles
*/

function numRows returns int64 ( input tblName as character ):

  define variable i as int64  no-undo.
  define variable b as handle no-undo.
  define variable q as handle no-undo.

  create buffer b for table tblName.
  create query q.

  q:set-buffers( b ).
  q:query-prepare( substitute( "preselect each &1 no-lock", tblName )).                 /* use "preselect" to get a record count        */
  q:forward-only = yes.
  q:query-open().

  i = q:num-results.

  q:query-close().

  delete object q.
  delete object b.

  return i.

end.


define variable RPB       as integer no-undo.
define variable CSZ       as integer no-undo.
define variable minRecid  as int64   no-undo.                                           /* we could figure this out based on CSZ        */
define variable maxRecid  as int64   no-undo.                                           /* no point in going higher                     */
define variable dumped    as int64   no-undo.

for each _file no-lock where _file._hidden = no:

  find _storageobject no-lock                                                           /* we need to find the *current*                */
    where _storageobject._db-recid = _file._db-recid                                    /* storage area -- not the *initial*            */
      and _storageobject._object-type = 1                                               /* storage area that holds the table            */
      and _storageobject._object-number = _file._file-num.                              /* ( _file._ianum = initial area)               */

  find _area no-lock where _area._area-number = _storageobject._area-number.
  find _areastatus no-lock where _areastatus._areastatus-areanum = _storageobject._area-number.

  assign
    RPB       = exp( 2, _Area-RecBits )
    CSZ       = _Area-ClusterSize
    minRecid  = ( 1 * RPB ) + 1                                                         /* there is always at least a masterblock       */
    maxRecid  = ( _AreaStatus-HiWater * RPB ) + ( RPB - 1 )
  .

  display
    _file._file-name
    _area-name
    RPB
    CSZ
    maxRecid
    numRows( _file-name )
  .

  run dumpbyrecid.p ( _file-name, minRecid, maxRecid, output dumped ) value( _file-name ).

  display dumped.

end.
Code:
/* dumpbyrecid.p
*
* {1} = table name
*
*/
 
define input        parameter tableName as character no-undo.
define input        parameter minRecid  as int64     no-undo.
define input        parameter maxRecid  as int64     no-undo.
define output       parameter dumped    as int64    no-undo.

define variable i as int64 no-undo. 

define stream dumpStrm.

output stream dumpStrm to value( substitute( "&1.d", tableName )).

do i = minRecid to maxRecid:

  find {1} no-lock where recid( {1} ) = i no-error.

  if available( {1} ) then
    do:
      export stream dumpStrm {1}.
      dumped = dumped + 1.
    end.

  /* i = i + 1. */

end.

output stream dumpStrm close.

return.
 
Last edited:

TomBascom

Curmudgeon
Dumping by RECID is possible. There are pros & cons:

PRO:
  • useful when the db is damaged and you cannot traverse indexes
  • useful if you want to avoid IO on indexes which might sometimes be faster
CON:
  • not exactly intuitive
  • if the rows per block badly mismatches the data you will do lots of unsuccessful FINDs (the difference between the last two columns displayed)
  • if you have LOBs my sample code isn't doing anything with them
In theory you could do a better job picking the minRecid. I accounted for the masterblock but I was too lazy to dig out the details on other "overhead" blocks in a storage area.

In a storage area with multiple tables you could probably come up with a way to map chunks that you don't need to look at. But if this is going to be used with big tables that dominate the areas they are in that probably isn't worth bothering with.

I used "compile on the fly" for this code. You could use dynamic queries if you really want to (or need to). I expect that would be a bit slower though.
 

ron

Member
Hi Rob - apologies for the delay in replying -- I got diverted onto other issues.

Below are the responses to your questions. My earlier question was part of a "collecting ideas" thing -- because although I have dumped/reloaded more DBs than I can count -- this one is in poor shape -- and pretty big.

  • What is the service pack of 10.2B? It is relevant when discussing D&L.
The version of OE is 10.2B out-of-the-box (as of Mon Dec 14 17:00:19 EST 2009).

I won't offer excuses for it .... it isn't "my" DB!! I'm just being asked to help.

(BTW: "My" DB is 11.7.4. )

  • Can you describe your process in more detail?
I have a fully-scripted binary D/L process I developed some years back that dumps in multiple streams. It produces a dbanalys report at the start and the end – parses them and verifies exactly record counts by table. Used it many times and works well – but the new challenge is this 10.2B DB which has not had a D/L for years and is ~800GB in size. I am experimenting with different techniques to find one that can get this done in an acceptable take-frame. What is "acceptable"? All I'm told is "it shouldn't take more than 24 hours".

  • How much time does it take for each phase (dump, load, rebuild)? In other words, which parts require the most tuning?
To date I have had limited access to this DB – but I did manage to do a test serial binary dump – and that took 62 hours. Note that it was 'allowed' to dump by the primary index and I know that the time can be shortened a lot using a table scan. That is certainly an option for at least the 12 really large tables.

  • How long has it been since the source primary indexes have been compacted or rebuilt? Do they need it?
Don't know yet. I suspect that they desperately "need it".

  • What is the current structure? Are the tables to be dumped in Type II storage already?
All Type II – and as far as I can see it is organized quite well. Really large tables are in their own Areas.

  • How are you dumping, single- or multi-threaded? DB online or offline? Using -RO? Table scan option (-index 0)?
My test was single-threaded – deliberately so I knew by table how big the hurdles were. Online – with -RO.

  • Do you have the option of speeding the process by dumping simultaneously on several machines with separate storage?
It has been discussed – but it would not be easy to do. It takes 7 hours to take a backup .... so distributing it to other servers is time-consuming.

  • Do you have any specific outcomes that are objectives of this dump and load, apart from optimizing its duration?
Currently the block size is 4K and the plan is to increase it to 8K. There are some minor changes to RPB planned – but they are not 'urgent'. The principal objective is performance – at present it is a struggle to get each day's work done in a day. There is also a lot of "empty" space to be reclaimed.
 

TomBascom

Curmudgeon
Even though a backup and restore to another server may take time, once that is done you can use after imaging or replication to sync it. I have used additional databases on temporary hardware like that with great success. In one case the initial serial d&l was in excess of 2 weeks. We ultimately got it done in 6 hours.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
The version of OE is 10.2B out-of-the-box
Any application licensee willing to take a 24-hour outage for a fairly major maintenance activity should be more than willing to take a short outage (~5 minutes) to install an OE service pack (SP08) with literally hundreds of beneficial bug fixes (some of which will certainly reduce the risk of the major maintenance activity), not to mention several new features, some of which will significantly reduce the required downtime of the major maintenance activity. So if your mandate is to fully optimize this activity, getting them to SP08 first is table stakes.

To date I have had limited access to this DB
See if you can get access to a similar test machine with a copy of the DB. If they don't have one, they need one. A D&L like this may require several iterations of trial and effect until you land on the appropriate approach: set up a D&L scenario, run it, log results and comments, repeat with a new scenario, etc.

Understanding the performance of each phase, and how different approaches (online versus offline, single-versus multi-threaded, etc.) is important. There are lots of different strategies possible in each phase and we haven't covered them all here.

This machine could also be used to parallelize the work of the dump phase.

Don't know yet. I suspect that they desperately "need it".
There is also a lot of "empty" space to be reclaimed.
These quotes make me think that the DB, or at least certain large tables in it, was once much larger, so there could be primary indexes with low utilization, i.e. sparse blocks. If so, reading them to do the dumps, in their present condition, limits performance. So I would suggest compacting or rebuilding those indexes to minimize read I/O and optimize caching during the dump.

There are some minor changes to RPB planned – but they are not 'urgent'.
When you D&L, you are creating all new areas, and potentially new assignments of objects to areas, so you get any RPB changes for free, so to speak. So their urgency shouldn't be a factor. If the changes are desirable (i.e. they could improve post-D&L caching efficiency without causing excessive record fragmentation), make them a part of your approach.

The principal objective is performance
Note that it was 'allowed' to dump by the primary index and I know that the time can be shortened a lot using a table scan.
It is important to know the desired outcome because it constrains the possible approaches to dumping data. And in this case it means you must optimize not only D&L duration but also post-D&L application performance.

A table-scan (no index) dump just follows the table's cluster chain and dumps the data in the order that it is found. This means it won't reorganize the data, i.e. reduce logical scatter or record fragmentation. Similarly, a multi-threaded dump of a table won't physically organize records in order of the dump index, so it may not yield the best post-D&L performance either.

at present it is a struggle to get each day's work done in a day.
I think this requires more analysis. A D&L can often provide a performance boost, at least for some period of time, but it won't cure all ills. Things to consider:
  • Do you understand the bottlenecks in the current prod environment?
  • Is the I/O subsystem up to snuff?
  • BI and AI tuned?
  • DB broker parameters tuned appropriately? Are there parameters not currently in use that could provide benefit?
  • Client startup parameters tuned? Are they TCP clients or self-service, or a mix? SQL also?
  • OS resources sufficient and used to best effect?
  • Do you have full CRUD stats for tables and indexes?
  • Is there evidence of application issues that could be causing performance issues, that you have to work around? E.g.:
    • heavy reads of a small control table (possible LRU latch contention)
      • possible solutions:
        • assign table and index(es) to Alternate Buffer Pool
        • add -lruskips 100 (requires 10.2B06+)
    • extremely heavy reads of a larger table (possibly badly-written or poorly-bracketed queries, wrong USE-INDEX, missing index)
      • possible solutions (likely requires app vendor involvement):
        • rewrite the queries
        • add index(es) to improve bracketing
    • majority of reads of a large, heavily-read table use the non-primary index
      • possible solution: alter your dump strategy to dump via the most-used index
    • overnight read-only reports that do heavy reads and pollute the buffer pool, impacting application client performance the next day until the cache warms up again
      • possible solution: if possible, have the client use private read-only buffers, either programmatically or via startup parameter; won't make the reporting client any faster but will benefit other users of the buffer pool
  • Do you have metrics from promon/VSTs that could indicate bottlenecks to be addressed
    • e.g. empty BI buffer waits, buffers flushed at checkpoint, low logical-to-physical read ratio, high latch naps, etc.
    • do you see temporal patterns in the data? (not just what the problems are but when they tend to occur)
  • Are there missing background processes that are impacting performance?
    • BIW, AIW, not enough APWs
  • Other resource contention issues
    • other workload on shared LUNs on a SAN
      • are you getting all the I/O bandwidth you expect?
    • other workload on the database server
      • are there database clients or ancillary applications that can run on another server instead?
    • other workload on the physical host
      • if this is a VM, could the physical host be overprovisioned, misconfigured (e.g. wrong vNIC driver or memory config), or sharing resources for work on other VMs?
Obviously, most of these things won't be addressed with a D&L. But reading between the lines, it seems the real work here is application performance optimization and the D&L is perceived as the means to that end. Certainly it can help if done well but the larger problem should be addressed holistically to achieve lasting and maximum benefit.
 
Top