Index Build

catch.saravana · Mar 10, 2017

Version: 11.6
OS: Linux CentOS 7
CPU: 4 cpu with 2 core each

I have 3 questions here, sorry if I should have put it as separate threads.

Question 1:
How do I come up with optimal values for -datascanthreads and -mergescanthreads?

Question 2:
Does -B have any effect on index build? - the reason why I asked this question is even if I give -B as 40GB it consumes almost all free memory space (i.e., around 110 GB) to run this index build [was monitoring using htop].

Question 3:
I created a DB on local disk 3 weeks back. It roughly took 12 hours for completing the D&L process of 1.5 TB database and 5.8 hours for completing Index Build. I was able to login and did some smoke testing on the application and felt all good [sudo /opt/dlc/bin/proutil /localdisk/xyzdb/xyz.db -C idxbuild all -i -TB 64 -TM 32 -TMB 512 -SG 64 -thread 1 -TF 80 -datascanthreads 12 -mergethreads 8 -T /localdisk/tmpForidx –B 5000000].

Couple of days back I created a new DB on localdisk with just one large table of size 350 GB and ran index rebuild multiple times with different DST & MST (say first run DST 16 MST 8, second run DST 20 MST 12, third run DST 32 MST 20 etc...). When I ran with DST 16 MST 8 it took around 72 minutes for index build to complete whereas when I ran with DST 32 and MST 20 it gave me better results where it took around 48 minutes to complete). Then I loaded one more table, our second largest table which will be another 200 GB and tested the same combinations and found better results for DST 32 and MST 20.

Looking at the results I ran index rebuild on the db that I created 3 weeks back with DST as 32, MST as 20 and rest of the parameters are same as mentioned above. It took 32 hours to get this completed (for a sample I can say the largest table that was taking 45 minutes when I ran on the 2 table db test takes around 7 hrs on this 1.5 TB db). IO Wait Time was less than 5 to 7%, CPU was free most of the times, si & so on VMSTAT was 0. Looking at the results I thought of going back to the original values to re-test the same with DST and 12 and MST as 8; it took 43 hours to get this completed. What could be the issue in this case?

Only couple of things were bit suspicious;
1. Swap memory (2 GB) shows full when I run htop for the current runs but 3 weeks back it was showing as completely free. But even with swap memory full the 2 table db test that I ran 2 days back and today gives me same results around 65 minutes which is good. It screws up only when I run it for the large db.
2. Process state is mostly in 'D' (uninterruptible sleep) but as I said IO Wait Time is around 5 to 7%.

What could be the reason? What more should I monitor to get this issue resolved?

Rob Fitzpatrick · Mar 10, 2017

1)
For -datascanthreads, try 1.5 * cores; 12 in your case.

The second parameter is -mergethreads, not -mergescanthreads. The parameters that work together in the sort/merge phase of idxbuild are -mergethreads and -threadnum. The numbe rof concurrent threads you can have at maximum in a given area is -mergethreads * -threadnum + -threadnum + 1. The -threadnum parameter determines how many sort groups will be worked on (sorted/merged) in parallel. The -mergethreads parameter determines how many threads will run concurrently within each sort group. Each sort group has a parent thread that coordinates the activity of the child threads. And the parent proutil thread does the b-tree key insertion once each sort group has been sorted and merged.

For this pair of parameters, the guidance I have heard is 1.5 threads * cores. For example, -threadnum 2 -mergethreads 5 would give you a max of 13 threads (2 * 5 + 2 + 1).

2)
For -B, the guidance is to select a small value to free up memory for -TF buffers. Try -B 1024.

3)
Offhand, I'm not sure what your issue is.
sudo /opt/dlc/bin/proutil /localdisk/xyzdb/xyz.db -C idxbuild all -i -TB 64 -TM 32 -TMB 512 -SG 64 -thread 1 -TF 80 -datascanthreads 12 -mergethreads 8 -T /localdisk/tmpForidx –B 5000000

Reduce -B, as per above. It is reducing memory available for sort/merge buffers (-TF) and may cause additional disk I/O (see below).
Add -pfactor 80. If you don't specify it, it is 100 which will result in expensive block splits once the application restarts and does creates and updates.
Add -rusage. It will cause (11480) and (11483) messages to be written to stdout and the db log file telling you how much I/O was done to temp files (-T or -SS). E.g.:
DBUTIL : (11480) Temporary sort file at /u/db/11.5/pac used up 0K of disk space.
DBUTIL : (11483) A total of 0K of temporary sort disk space was used for area 6.
If these numbers are larger than 0 KB then you are doing extra disk I/O: write I/O during the data scan/key build phase, read/write during the sort/merge phase, and read I/O during the b-tree insertion phase.

The guidance above is a mix of theory and practice. But every system and release is different, and what is best for one DB may not be best for another. Ultimately, practice trumps theory. Whatever works best for you may be better than what I have listed above. So the key to tuning idxbuild is repeated planning, testing, and documentation of test results. The keys in my opinion are:

Multi-thread data scans where possible (-datascanthreads);
Give as much memory to -TF as possible without causing swap I/O;
Multi-thread sort/merge (-threadnum/-mergethreads).
If you cannot eliminate extra disk I/O (i.e. I/O apart from data scan reads and b-tree writes), use -T or -SS to keep I/O away from database disks. Make the most of your available I/O bandwidth.

catch.saravana · Mar 10, 2017

Thanks Rob, as usual excellent explanation to datascanthread/mergethread parameters and how to set optimal value for it.

On question 3: sudo /opt/dlc/bin/proutil /localdisk/xyzdb/xyz.db -C idxbuild all -i -TB 64 -TM 32 -TMB 512 -SG 64 -thread 1 -TF 80 -datascanthreads 12 -mergethreads 8 -pfactor 80 -rusage -T /localdisk/tmpForidx –B 1024

I ran a test with the above command but didn't work - I mean my issue hasn't got resolved. The first one is the largest table which is in a separate area and it has 4 index. As I said it took around 48 minutes to build 4 index of the large table under normal circumstance whereas it has gone past 1 and half hour and it hasn't event completed building 1 index out of 4 of that table - issue is somewhere else. If there is something else I need to look at, please do let me know.

on top -c;
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6624 root 20 0 86.221g 0.082t 2840 D 10.6 66.9 113:26.60 /opt/oe/dlcOE116/bin/_proutil /localdisk/xyzdb/xyz.db -C idxbuild all -i -TB 64+

[sbalasub@abcbox ~]$ free -h

total used free shared buff/cache available

Mem: 125G 85G 593M 83M 40G 40G
Swap: 2.0G 2.0G 0B

Rob Fitzpatrick · Mar 10, 2017

For a given area, the idxbuild all is made up of four distinct phases: index deletion, data scan/key build, sort/merge, and b-tree insertion. Unfortunately the logging isn't ideally detailed so you can see the beginning and end of each phase but most of the details are there. Try to determine not just the total run time but also the length of each phase, given a particular set of parameters. That may give you a clue about the parameters you should tune to try to get faster runs in future.

Also, as I said, look for the 11480 and 11483 messages to see if you're using temp file space. In some cases, it can actually be beneficial to break the process out into several idxbuild commands that each build a subset of the indexes, so that the sort buffers can remain memory-resident.

One other thing: the first phase (index deletion), while it shouldn't take too long, can in some cases be completely eliminated. If you are rebuilding all the indexes in an area, you can first truncate the area (proutil dbname -C truncate area area-name) to free up the blocks, so idxbuild doesn't have to do it. Before taking this approach, please programmatically confirm that the area only contains indexes to be rebuilt; no other indexes, and no tables.

catch.saravana · Mar 10, 2017

Rob Fitzpatrick said:
Before taking this approach, please programmatically confirm that the area only contains indexes to be rebuilt; no other indexes, and no tables.

As per the current design, It's a Type II storage and I can say for sure data, index and word index are all in different areas.

Rob Fitzpatrick said:
Also, as I said, look for the 11480 and 11483 messages to see if you're using temp file space.

If I am right I can see this for the first area only after 8 hours for the current run (it's still running). I will keep you posted on this.

If I have to segregate with the last run results to see which phase is consuming so much time then in my case I would say 'Build' is where is consuming so much of time, below are logs from index build.

Phase 1, processing block number 25288704 of 25289215 in area 100. (8367)
Phase 1, processing block number 25288960 of 25289215 in area 100. (8367)
Area 100: Multi-threaded record scan (Type II) complete. Elapsed time: 742.797
Start 4 threads for the area. (14536)
Sorting index group 0
Sorting index group 1
Sorting index group 2
Sorting index group 3
Spawning 20 threads for merging of group 0.
Spawning 20 threads for merging of group 1.
Spawning 20 threads for merging of group 3.
Spawning 20 threads for merging of group 2.
Sorting index group 0 complete. Elapsed time: 213.541
Building index 1856 (prim) of group 0 (16757)
Sorting index group 2 complete. Elapsed time: 222.786
Sorting index group 3 complete. Elapsed time: 274.196
Sorting index group 1 complete. Elapsed time: 289.126
Building of indexes in group 0 completed. Elapsed time: 7873.494
Building index 1857 (file-name) of group 1 (16757)
Building of indexes in group 1 completed. Elapsed time: 3017.562
Building index 1858 (rev) of group 2 (16757)
Building of indexes in group 2 completed. Elapsed time: 5997.163

Rob Fitzpatrick · Mar 10, 2017

catch.saravana said:
sudo /opt/dlc/bin/proutil /localdisk/xyzdb/xyz.db -C idxbuild all -i -TB 64 -TM 32 -TMB 512 -SG 64 -thread 1 -TF 80 -datascanthreads 12 -mergethreads 8 -pfactor 80 -rusage -T /localdisk/tmpForidx –B 1024

catch.saravana said:
Spawning 20 threads for merging of group 0.
Spawning 20 threads for merging of group 1.
Spawning 20 threads for merging of group 3.
Spawning 20 threads for merging of group 2.

It looks like this run had different parameters from those quoted above. Can you post the idxbuild parameters you used to get these results?

To me this looks like a lot of threads for an 8-core machine to handle. Multi-threading is helpful to a point but with large numbers it adds context-switching overhead.

I believe idxbuild is choosing its own value for -threadnum since you haven't specified it. It defaults to the number of cores. You should specify it explicitly. You don't need -thread 1 as that's already the default value with the Enterprise license.

catch.saravana · Mar 11, 2017

Sorry for the confusion, Rob.

as i mentioned before, this is day before yesterday's log [sudo /opt/dlc/bin/proutil /localdisk/xyzdb/xyz.db -C idxbuild all -i -TB 64 -TM 32 -TMB 512 -SG 64 -thread 1 -TF 80 -datascanthreads 12 -mergethreads 8 -T /localdisk/tmpForidx –B 5000000]

catch.saravana said:
Spawning 20 threads for merging of group 0.
Spawning 20 threads for merging of group 1.
Spawning 20 threads for merging of group 3.
Spawning 20 threads for merging of group 2.

new ones as per our discussion; largest table (haven't mentioned threadnum yet); [sudo /opt/dlc/bin/proutil /localdisk/xyzdb/xyz.db -C idxbuild all -i -TB 64 -TM 32 -TMB 512 -SG 64 -thread 1 -TF 80 -datascanthreads 12 -mergethreads 8 -pfactor 80 -rusage -T /localoacis/tmpForidx –B 1024 >> /localdisk/logs/cycle2_index_process_mar_10v1.log 2>&1]

Start 4 threads for the area. (14536)
Sorting index group 0
Spawning 8 threads for merging of group 0.
Sorting index group 2
Spawning 8 threads for merging of group 2.
Sorting index group 1
Spawning 8 threads for merging of group 1.
Sorting index group 3
Spawning 8 threads for merging of group 3.
Sorting index group 0 complete. Elapsed time: 198.677
Building index 1856 (prim) of group 0 (16757)
Sorting index group 2 complete. Elapsed time: 214.450
Sorting index group 3 complete. Elapsed time: 273.471
Sorting index group 1 complete. Elapsed time: 281.169
Building of indexes in group 0 completed. Elapsed time: 9844.196
Building index 1857 (file-name) of group 1 (16757)
Building of indexes in group 1 completed. Elapsed time: 3025.873
Building index 1858 (rev) of group 2 (16757)
Building of indexes in group 2 completed. Elapsed time: 6034.224
Building index 1859 (sourceTime) of group 3 (16757)
Building of indexes in group 3 completed. Elapsed time: 2687.079
Multi-threaded index sorting and building complete. Elapsed time: 21791.551

Resource usage: CPU user 4135.103353, system 299.929271
Resource usage: DISK reads: 30329584 KB at 1391 KB/sec, writes: 38526632 KB at 1767 KB/sec
4 indexes were rebuilt in area 100. Total elapsed time: 22410.116 (16171)

Resource usage: CPU user 8433.878374, system 455.971994
Resource usage: DISK reads: 232256195 KB at 10 MB/sec, writes: 38526632 KB at 2 MB/sec

Temporary sort file at /localdisk/tmpForidx/ used up 0K of disk space. (11480)
A total of 0K of temporary sort disk space was used for area 100. (11483)

The only difference I see between my initial run (on the same db which took 5.8 hours) and the current run is 'BUILD' phase which consumes way too much time. As you have mentioned before I will run it with threadnum and do my next test. Considering that proutil takes threadnum as 1 per core, shall I go ahead and test it with 8 (4 CPU * 2 Core each in my case) or would you like to give someother number for the test?

Rob Fitzpatrick said:
To me this looks like a lot of threads for an 8-core machine to handle. Multi-threading is helpful to a point but with large numbers it adds context-switching overhead.

True Rob - I agree, but as I said when I tested with a db that had only the largest table on the same localdisk with more DST & MT took just 48 minutes for complete index rebuild - that's where I got confidence and ran it on the db with 950 tables but to my surprise the largest table is the first one on area 100 which is now taking 6 hours for just Build Phase (other phases consumes under 10 minutes). Why is that the single db large table exercise is taking much lesser time? - any specific reason.

Rob Fitzpatrick said:
I believe idxbuild is choosing its own value for -threadnum since you haven't specified it. It defaults to the number of cores. You should specify it explicitly. You don't need -thread 1 as that's already the default value with the Enterprise license.

I was told by snr.dba's saying if we give -thread as 1 and don't specify any value on -threadnum it will default to number of cores which is better, that's the reason I had it like this. As you suggested I will remove -thread 1. In my case what value would you advise me to give for -threadnum for my next run?

catch.saravana · Mar 13, 2017

Little more update, over the weekend I had to build a new environment on the same local disk. A fresh load of 1.5 TB DB took 6 hrs and Index Build got completed in 5.7 hrs - it's same parameters for load and index build and almost close to the same set of data. As per previous logs you could see index sorting & build phase of 'Area 100' took 6 hrs whereas on the new environment it took just 39 minutes.

The highlighted 5 lines are very scary when I compare with the previous log.

Resource usage: DISK reads: 201943811 KB at 295 MB/sec, writes: 0 KB at 0 MB/sec
Start 4 threads for the area. (14536)
Sorting index group 0
Sorting index group 1
Sorting index group 2
Sorting index group 3
Spawning 8 threads for merging of group 3.
Spawning 8 threads for merging of group 2.
Spawning 8 threads for merging of group 1.
Spawning 8 threads for merging of group 0.
Sorting index group 0 complete. Elapsed time: 206.551
Building index 1856 (prim) of group 0 (16757)
Sorting index group 2 complete. Elapsed time: 223.105
Sorting index group 3 complete. Elapsed time: 275.046
Sorting index group 1 complete. Elapsed time: 284.448
Building of indexes in group 0 completed. Elapsed time: 334.180
Building index 1857 (file-name) of group 1 (16757)
Building of indexes in group 1 completed. Elapsed time: 310.698
Building index 1858 (revision) of group 2 (16757)
Building of indexes in group 2 completed. Elapsed time: 278.445
Building index 1859 (sourceTime) of group 3 (16757)
Building of indexes in group 3 completed. Elapsed time: 279.164
Multi-threaded index sorting and building complete. Elapsed time: 1410.871

Resource usage: CPU user 2737.979416, system 37.285459
Resource usage: DISK reads: 240 KB at 0 MB/sec, writes: 68855728 KB at 48 MB/sec
4 indexes were rebuilt in area 100. Total elapsed time: 2079.209 (16171)

Resource usage: CPU user 7174.676142, system 209.010558
Resource usage: DISK reads: 201944051 KB at 95 MB/sec, writes: 68855728 KB at 32 MB/sec

Temporary sort file at /localoacis/tmpForidx1/ used up 0K of disk space. (11480)
A total of 0K of temporary sort disk space was used for area 100. (11483)

What could be the reason for the huge difference - any thoughts? Only difference is the current log is from a net new db and the previous logs are from a db on which index build was run 3rd time in last 1 week.

Rob Fitzpatrick · Mar 13, 2017

Sorry for the delay, real life gets in the way sometimes...

As you point out, the first run showed very low values for disk read and write throughput. Temp space wasn't the issue as you didn't use any in either case.

In theory, I'd expect the second run to be faster (though perhaps not much) due to it being a database that was just loaded, therefore it would have very low logical scatter and record fragmentation, so (at least for the dump index) I would expect sorting to be quicker. But in fact sorting was a bit quicker the first time, and the big difference between the two was in b-tree insertion ("building of indexes in group").

That phase does reads from merge buffers, which in your case were memory-resident, and does writes to the disk. As you point out, -rusage shows that read and write throughput were an order of magnitude higher in the second run. This makes me think the difference in run times lies in your disk subsystem.

catch.saravana · Mar 13, 2017

No probs, Rob - I am quite happy to wait.

In this case I feel like I can do a restore from hotspare database which will hardly take 6 hrs rather than rebuilding the index for 32 hrs. I would like to know what will be your approach if you are in my situation. Please advise.

Also would like to know, have your ever faced a situation in prod where you end up doing a "index build all" other than setting up a net new environment.

Rob Fitzpatrick · Mar 14, 2017

catch.saravana said:
In this case I feel like I can do a restore from hotspare database which will hardly take 6 hrs rather than rebuilding the index for 32 hrs. I would like to know what will be your approach if you are in my situation.

Let's back up a step. You began the thread talking about how to optimize an index rebuild. If restoring a backup is an option, what is the intent of rebuilding indexes?

catch.saravana said:
Also would like to know, have your ever faced a situation in prod where you end up doing a "index build all" other than setting up a net new environment.

I'm not sure I understand exactly what you mean. Are you asking if I have done idxbuild all instead of dump/load/idxbuild all in order to improve application performance?

catch.saravana · Mar 14, 2017

Rob Fitzpatrick said:
Let's back up a step. You began the thread talking about how to optimize an index rebuild. If restoring a backup is an option, what is the intent of rebuilding indexes?

In this case, yes I can do a restore but I don't want to do that until I understand this issue. The reason basically is as I pointed before the Disk Read says 10 mb/sec and Disk Write says 2 mb/sec (don't know the reason yet) whereas the other location on the same disk has 95 mb/sec read and 32 mb/sec write. Not sure if this is something I have to worry about but what happens if application performance gets slower after restore on the location where disk read and write is slow right now. Is that possible or is it an unnecessary worry from my side?

Rob Fitzpatrick said:
Are you asking if I have done idxbuild all instead of dump/load/idxbuild all in order to improve application performance?

I have used 'idxbuild some' if I have to add index to a large table or any data corruption scenario and have used 'idxbuild all' when I have to do a fresh environment setup where I have to do a binary dump and load. My intention was to ask you if there is any other scenario where we will need to go for idxbuild based on your experience.

TomBascom · Mar 14, 2017

I only ever use idxbuild all when doing a d&l.

catch.saravana · Mar 14, 2017

Thanks @TomBascom

Rob Fitzpatrick · Mar 14, 2017

Same here. I will rebuild or compact some indexes, e.g. after a large purge of data from a particular table, to make them physically smaller and improve the efficiency of reading them.

I do think you should look further into your disk situation. I wouldn't want to have any database, particularly a large one, on media that can only do 10 MB/s read and 2 MB/s writes. Try to reproduce that result outside of OpenEdge, e.g. with a dd command or something like that, so that whoever is responsible for storage can't just write off the problem by saying "Progress is slow".

catch.saravana · Mar 14, 2017

Rob Fitzpatrick said:
Try to reproduce that result outside of OpenEdge, e.g. with a dd command or something like that, so that whoever is responsible for storage can't just write off the problem by saying "Progress is slow".

Thanks @Rob Fitzpatrick. I agree.

Index Build

catch.saravana

Member

Rob Fitzpatrick

ProgressTalk.com Sponsor

catch.saravana

Member

Rob Fitzpatrick

ProgressTalk.com Sponsor

catch.saravana

Member

Rob Fitzpatrick

ProgressTalk.com Sponsor

catch.saravana

Member

catch.saravana

Member

Rob Fitzpatrick

ProgressTalk.com Sponsor

catch.saravana

Member

Rob Fitzpatrick

ProgressTalk.com Sponsor

catch.saravana

Member

TomBascom

Curmudgeon

catch.saravana

Member

Rob Fitzpatrick

ProgressTalk.com Sponsor

catch.saravana

Member