Help BI architecture

kdefilip · Nov 1, 2014

Hi All

I'm a bit confused re BI

We have a directory which contains our BI files, 6 in total which seem to grow to 2G each.

Does each of these files represent one cluster?

Why in promon and proutil do I seem to have a disparity with BI size and BI cluster size (see attached)

Rob Fitzpatrick · Nov 1, 2014

The number and size of the extents (files) that make up the BI is determined by the database structure, which is specified with structure files and prostrct commands. The sum of all these extents makes up "the BI file" (aka "the before image area" aka "the primary recovery area").

BI blocks and clusters are unrelated to the number or dimension of the files. BI block size (in KB) is chosen with proutil dbname -C truncate bi -biblocksize n. BI cluster size (also in KB) is chosen with proutil dbname -C truncate bi -bi n. BI clusters are logical structures within the BI file.

The various screens where you see BI block size and cluster size are all telling you the same thing. Proutil describe is a little confusing as it gives you cluster size in units of 16 KB instead of KB, but if you do the math (1024 * 16 KB = 16 MB) it's the same size you see in promon screen 7 or R&D 1 9.

When you talk about the size of the BI file, there are two ways to look at it: the physical size (sum of the sizes of the extents) and the logical size (the number of active BI clusters * the BI cluster size). Promon 5 and R&D 1 9 both show you physical size, just in different units.

When you truncate the BI file (proutil dbname -C truncate bi) you are changing the file logically. And if the variable BI extent had grown, or was the only extent, the command also changes that file physically.

The command commits the changes it contains to the database extents so that the physical structure of the BI file can be reset back to what is dictated in the structure file. The next time you open the database after a truncate bi it allocates four BI clusters. If they fit within fixed-size extents then the files don't have to grow. If they don't, or if there is only a variable extent, the variable extent grows to a size large enough to fit four BI clusters. If you issue a proutil bigrow n command after truncate bi, it allocates n more clusters in addition to the original four.

In your case, if you have six extents of 2 GB each, they don't "grow". They are fixed-size; that's what the "f" in the structure file line items means. They stay at the allocated size (in KB) no matter what you do, including truncate bi. You can have at most one variable-size extent per storage area and it must be the last extent. If an extent isn't marked as "f" or "v" then it is variable.

kdefilip · Nov 1, 2014

Rob Fitzpatrick said:
The number and size of the extents (files) that make up the BI is determined by the database structure, which is specified with structure files and prostrct commands. The sum of all these extents makes up "the BI file" (aka "the before image area" aka "the primary recovery area").

BI blocks and clusters are unrelated to the number or dimension of the files. BI block size (in KB) is chosen with proutil dbname -C truncate bi -biblocksize n. BI cluster size (also in KB) is chosen with proutil dbname -C truncate bi -bi n. BI clusters are logical structures within the BI file.

The various screens where you see BI block size and cluster size are all telling you the same thing. Proutil describe is a little confusing as it gives you cluster size in units of 16 KB instead of KB, but if you do the math (1024 * 16 KB = 16 MB) it's the same size you see in promon screen 7 or R&D 1 9.

When you talk about the size of the BI file, there are two ways to look at it: the physical size (sum of the sizes of the extents) and the logical size (the number of active BI clusters * the BI cluster size). Promon 5 and R&D 1 9 both show you physical size, just in different units.

When you truncate the BI file (proutil dbname -C truncate bi) you are changing the file logically. And if the variable BI extent had grown, or was the only extent, the command also changes that file physically.

The command commits the changes it contains to the database extents so that the physical structure of the BI file can be reset back to what is dictated in the structure file. The next time you open the database after a truncate bi it allocates four BI clusters. If they fit within fixed-size extents then the files don't have to grow. If they don't, or if there is only a variable extent, the variable extent grows to a size large enough to fit four BI clusters. If you issue a proutil bigrow n command after truncate bi, it allocates n more clusters in addition to the original four.

In your case, if you have six extents of 2 GB each, they don't "grow". They are fixed-size; that's what the "f" in the structure file line items means. They stay at the allocated size (in KB) no matter what you do, including truncate bi. You can have at most one variable-size extent per storage area and it must be the last extent. If an extent isn't marked as "f" or "v" then it is variable.

Okay, thanks. I'm just trying to get my head around why we are having so many partial writes on BI. I see 12G worth of physical BI files five are "f" and one is "v", one place tells me the BI is 8166M, proutil says something a little different/but the same. Cluster size is set at 16384 and yet we continue to have many partial writes. I don't get it. -mf is set at 3, but after reading the details of -mf per progress, still not sure if this is the cause. And still have many BI buffer busy waits despite the fact that BI is now isolated on its own disk. I'm quite baffled at this point. Additionally, I'm not sure how many BI clusters we have, but if we are at a default of 4, why does it take 12G of physical files to service 4 clusters of 16384 each.

Rob Fitzpatrick · Nov 1, 2014

kdefilip said:
why does it take 12G of physical files to service 4 clusters of 16384 each.

It doesn't. 4 clusters * 16 MB/cluster = 64 MB. When the database needs to add a new BI cluster (and whether it does is a function of your application and your user activity) that will be done within the fixed extents so no new file system allocation will need to be done. It will just be a matter of formatting the BI blocks within the new cluster. That is why some people choose to use fixed extents; so you don't take the run-time hit of allocating new clusters and extending the file. Whether that is a measurable benefit these days is another discussion.

kdefilip said:
And still have many BI buffer busy waits despite the fact that BI is now isolated on its own disk.

I believe busy buffer waits will be a function of transaction volume which you can't do a lot about.

TheMadDBA · Nov 3, 2014

BI buffer busy waits refer to the lack/conflict of the memory structures set by -bibufs. No relation to IO issues. You are going to have some on a busy system no matter what... in you are getting hundreds a second it could be something to worry about. A few a second is nothing.

kdefilip · Nov 6, 2014

Hi again
Thankfully, after the changes we made the other night, all performance indicators look much better during work hours.

BI buffer busy waits are way down
Resource waits and timeouts are way down
OM lock/latches are now non-existent, zero across the board
Checkpoints are now happening at reasonable intervals with no increase in duration or sync time
lowering spin has not caused a degradation in buffer hit rations or any impact on cpu. I expected naps to be way up as a result and they are not. There has been an increase in naps, but it is a marginal increase and one I expected.
All latch wait times are way down across the board without any great increase in spin waits

We certainly have more to do, but I am more than pleased with the result thus far.
And I appreciate all your (and others) help in deciphering all this progress stuff.

jdpjamesp · Nov 6, 2014

Good job! Onwards and upwards!

Rob Fitzpatrick · Nov 6, 2014

Do you have any reports of, or metrics on, how the application is running compared to before the changes? Ultimately, that's what matters.

kdefilip · Nov 7, 2014

Rob Fitzpatrick said:
Do you have any reports of, or metrics on, how the application is running compared to before the changes? Ultimately, that's what matters.

Do you mean at the keyboard/user experience or are you referring to the back end?

Rob Fitzpatrick · Nov 7, 2014

It depends on the application. If you have a front-end UI, how has it changed? If you have batch updates or reports, how have they changed?

kdefilip · Nov 7, 2014

Rob Fitzpatrick said:
It depends on the application. If you have a front-end UI, how has it changed? If you have batch updates or reports, how have they changed?

So we have a few things here. Due to various network setups at various geographic sites, our complaints fall into three categories or groups. The sites that are on decent network connections, their complaints have been mostly momentary waits for screens and just general slowness between aspects of their work flow. These folks, as well as those that are not geographically separated from the hardware are now reporting that the app is "snappier" and "crisper". Yes, fuzzy evidence at best, but it's something. Unfortunately, at the keyboard we are often dealing with "feelings" rather than real evidence. We have not been back to those sites with a stopwatch as yet, but that is on the radar for the very near future.

The next two groups fall into different, but similar categories; one group is on mpls connections the other on metronet. As I understand it, these networks are more like a spoke and hub kinda of setup, each being able to be over subscribed. In other words, for example, if 100m has been purchased and you have 5 sites on a particular metronet net, it would be great if each only bursted to 20m. However, in this scenario, at any given time, one site may pull a large risc/pac study and swamp the entire 100m, impacting all 5 of these locations. In both of these groups, this situation is being address. However, until that situation is address, nothing I do on the back end will help and any measurement at those locations is meaningless until that situation is resolved. In one case, we have doubled bandwidth, however, we still see situations where one site can consume all the bandwidth. In each of these situations we are moving to point to point connectvity rather than this "hub" setup. But that will take some time.

Help BI architecture

kdefilip

Member

Rob Fitzpatrick

ProgressTalk.com Sponsor

kdefilip

Member

Rob Fitzpatrick

ProgressTalk.com Sponsor

TheMadDBA

Active Member

kdefilip

Member

jdpjamesp

ProgressTalk.com Moderator

Rob Fitzpatrick

ProgressTalk.com Sponsor

kdefilip

Member

Rob Fitzpatrick

ProgressTalk.com Sponsor

kdefilip

Member