C
ChUIMonster
Guest
Some things can be done well in advance and should not be part of the outage -- for instance, the empty database should be built, have the .df loaded and the extents pre-sized a day or two in advance. It also sounds like you might be doing a lot of things sequentially -- i.e. dump the data, transfer the data, load the data. If that is true there are a couple of things you can do that will save you a lot of outage time. Instead of a "transfer the data" step use a shared filesystem. NFS mount some disks and dump to those disks. As soon as the dump of a table is complete you can start loading it. How do you know it is safe to start loading? Have the dump go to a "dump" directory. Use the Linux "mv" command on the dump side to rename the .bd file from the dump directory to the "stage" directory. IOW: proutil dbname -C dump tableName /share/dump ; mv /share/stage/tableName.bd /share/stage then on the load side do something like: mv /share/stage/tableName.bd /share/load ; proutil dbname -C load /share/load/tableName.bd -r ; mv /share/load/tableName.bd /share/arc In this way only *active* dump files will be in the dump dir and only active load files will be in the load directory. Any file that is in the stage directory is ready to be loaded. The dump and load processes can overlap. Naturally small tables will be ready to load first. In a lot of cases the time to do the whole process becomes the time needed to dump the largest table plus the time needed to load that table plus the index rebuild. (All the smaller stuff will have been completed by the time the biggest table is ready to load.) You can experiment to find the best balance of dump threads to get the fastest time. I often find that: 1) dump all of the tables that are expected to be very small in one thread -- this is usually something like half or two thids of all the tables. 2) dump the largest table in it's own thread 3) spread the rest among N - 1 threads round robin where N is the number of full cores available (NOT hyper threaded cores) 95% of the time that's going to be more than good enough. If your disks and network are really fast you can try more threads but you are probably going to bottleneck on the disks on the dump side. On the load side -- *most* of the time a single loader thread with idxbuild all at the end of the process is going to be fastest. That is at least partly because most of the time one big table is dictating the schedule. Everything else is done by the time the biggest table is ready to start loading.
Continue reading...
Continue reading...