AI and BI disc i/o

ron

Member
I've been collecting statistics using VST's for a few weeks. I'm curious about what's happening with AI and BI writes ... and thought someone might care to comment.

During a particular 15 minute period when batch processing was done last night:

AI: Partial Writes.... 17,785
AI: Total Writes...... 17,785
AI: AIW Writes....... 17,467
AI: Busy Buff Waits.. 167

BI: Partial Writes.... 8,768
BI: Total Writes...... 26,281
BI: BIW Writes....... 14,128
BI: Busy Buff Waits. 10,940

I've noticed that whether the system load is heavy or light, the exact same pattern persists:

(1) For AI the three "write" figures are very close. In particular: partial and total writes are nearly always identical and AIW writes is always 95% or more of all writes. Busy Buffer Waits is always very small.

(2) For BI, partial writes is always about 1/3rd of total writes. BIW writes is always about 50% of total writes. Busy Buffer Waits is always very high.

Why is there such a difference between the figures for AI and BI?

(I appreciate that the high incidence of partial writes is exacerbated by having the AI and BI block size set to 16K ... and in the near future we will be seeing what happens if that is reduced to 8K.)

Environment:

Sun V480 + Solaris 8 + Progress 9.1D06 + 16 GB memory.

DB: Striped across 8 x 36 GB discs; 8K block size. (3 x APW)
AI: Dedicated 36 GB disc; 16K block size. (AIW + 30 aibufs)
BI: Dedicated 36 GB disc; 16K block size. (BIW + 30 bibufs)

-B 27500
 
As usual - I'll answer your question with a question! Is 2 phase commit in operation?

The only number in this group that worries me is the BI Busy Buf waits.

17785 AI writes in 15 minutes is about 19/s which is easily do-able by 5400RPM IDE drives, so any halfway decent SCSI system should be fine!

Even your BI writes are only 29/s which should also be achievable.

If your disks are genuinely reserved for a specific bi file then this will help.

Busy bi buffers will be far more of a problem, since BI and AI writes are synchronous.

My suggestions (FWTW)
1. Double bibufs - or if combined with reduction in blocksize then quadruple! (I can't remember if bibufs is in BI Block size or DB block size)
2. If NOT using 2PC then increase -Mf to 5 or 6
3. If using 2PC then review -groupdelay (I suspect that you already have this enabled but it could be larger....)
4. Set the -groupdelay on all the databases if you are using 2PC, since delays on one database will necessarily affect the other.

You've only given us stats for one db (I think) - anything interesting on any of the others that may point the way?

And I still think that the -B is significantly too low for this system! As an example - Sun E250, 2 CPU, 1GB memory, Primary DB -B = 32768 = 256MB Buffer space
 

ron

Member
toby.Harman said:
As usual - I'll answer your question with a question! Is 2 phase commit in operation?

The only number in this group that worries me is the BI Busy Buf waits.

17785 AI writes in 15 minutes is about 19/s which is easily do-able by 5400RPM IDE drives, so any halfway decent SCSI system should be fine!

Even your BI writes are only 29/s which should also be achievable.

If your disks are genuinely reserved for a specific bi file then this will help.

Busy bi buffers will be far more of a problem, since BI and AI writes are synchronous.

My suggestions (FWTW)
1. Double bibufs - or if combined with reduction in blocksize then quadruple! (I can't remember if bibufs is in BI Block size or DB block size)
2. If NOT using 2PC then increase -Mf to 5 or 6
3. If using 2PC then review -groupdelay (I suspect that you already have this enabled but it could be larger....)
4. Set the -groupdelay on all the databases if you are using 2PC, since delays on one database will necessarily affect the other.

You've only given us stats for one db (I think) - anything interesting on any of the others that may point the way?

And I still think that the -B is significantly too low for this system! As an example - Sun E250, 2 CPU, 1GB memory, Primary DB -B = 32768 = 256MB Buffer space

No ... we don't use 2PC.

Statistics for the other databases are collected and, frankly, they are most uninteresting. The dominance of the BDB database is more than 10:1.

Yes ... the AI and BI writes don't stress the i/o system at any time. During the same 15 minutes as the other figures were collected:

Log reads ... 6,447,367
Log writes ...2,405,140
OS reads ..... 234,115
OS writes ..... 183,963
and the database metadisc was 91% busy.

So the database i/o was the constraint during that period.

Setting -B to 27500 may well seem too low ... but it was recently upped to 110000 for several days and there was no detectable change to any metric. Buffer hit ratio stayed the same, job execution times stayed the same ... everything stayed the same! Maybe it's because billing system data is so distributed that we can't keep enough in the buffers to make it worthwhile.

I'll get back to you re the other issues.

Thanks,
Ron. :drink:
 
Add a 0 to the -B and you may get closer!

Seriously - If memory serves (eeek) you are allowed 21 shared memory segments per database (Broker limitation) and Solaris 9/Progress 9.1D06 32Bit has (check this please) a limit of 128MB on shared memory segments.

So /etc/system should have a setting in it of set shmsys:shminfo_shmmax=134217728

21 * 134217728 = 2818572288 = 2.625GB
Divide by block size (8096) = 344604

Allowing for Lock tables etc I would set it to about 300000.

If you are not running 2PC then definitely increase -Mf

Question for the Sysadmin - Is Priority Paging in use - Check the progress kbase on this, but the recommendation from on high (Gus) is that you set this on always for database servers - make sure that the database does NOT have the execute permission set on the data extents.

Enjoy!
 

ron

Member
toby.Harman said:
Add a 0 to the -B and you may get closer!

Seriously - If memory serves (eeek) you are allowed 21 shared memory segments per database (Broker limitation) and Solaris 9/Progress 9.1D06 32Bit has (check this please) a limit of 128MB on shared memory segments.

So /etc/system should have a setting in it of set shmsys:shminfo_shmmax=134217728

21 * 134217728 = 2818572288 = 2.625GB
Divide by block size (8096) = 344604

Allowing for Lock tables etc I would set it to about 300000.

If you are not running 2PC then definitely increase -Mf

Question for the Sysadmin - Is Priority Paging in use - Check the progress kbase on this, but the recommendation from on high (Gus) is that you set this on always for database servers - make sure that the database does NOT have the execute permission set on the data extents.

Enjoy!

Thank you Toby - although I think you're pushing me towards "Information Overload"!

With regard to demand paging ... it looks like things have changed with Solaris 8, because Sun now say:
--------------------------------------------------------------------------------
Caution -
We recommend that all tuning of the VM system be removed from /etc/system. Run with the default settings and determine if it is necessary to adjust any of these parameters. Do not enable priority_paging or adjust cachefree. These are no longer needed, although still present in the kernel. Manipulating them will almost certainly result in performance degradation when the page scanner runs.
--------------------------------------------------------------------------------

In our /etc/system we have:

set shmsys: shminfo_shmmax=314572800

We'll have another go at increasing -B to as high as we can manage ... to see what happens.

I'll also see about setting -Mf above the default. I already had my eye on that one. Dan's docs aren't enthusiastic about its usefulness ... but logic tells me that there must be some potential there.

Ron.

(BTW: 2**13 = 8192, not 8096! Is mixing 4096 and 8192 a "mixed metatwo"?)
 
Welcome to the new improved Memory Manager in Ron's Head!

Priority paging was recommended for Solaris 2.6 and 7, but they said they were going to fix it fo "later versions"

See KBASE 21207 and also sun's own website (again old references ) http://www.sun.com/sun-on-net/performance/priority_paging.html

-Mf - we used it and it gave us a bit, but we got significantly more out of -bibufs

Yeah, yeah - all right so I can't count - or even multiply!

:blush1:
 

ron

Member
toby.Harman said:
Welcome to the new improved Memory Manager in Ron's Head!

Priority paging was recommended for Solaris 2.6 and 7, but they said they were going to fix it fo "later versions"

See KBASE 21207 and also sun's own website (again old references ) http://www.sun.com/sun-on-net/performance/priority_paging.html

-Mf - we used it and it gave us a bit, but we got significantly more out of -bibufs

Yeah, yeah - all right so I can't count - or even multiply!

:blush1:

... nor exponentiate! (Sorry, couldn't resist.)

The Sun Solaris 8 reference is:

docs.sun.com: Solaris 8 2/02 Update Collection >> Solaris Tunable Parameters Reference Manual >> 2. Solaris Kernel Tunables >> Paging-Related Tunables

http://docs.sun.com/db/doc/816-0607/6m735r5eq?a=view

I checked-out the Progress and Sun references - but it appears that with Progress 8 one shouldn't meddle with the paging parameters. Sun obviously believe they have "got it perfect".

I will have to leave tuning alone for a week or two - I have to commission the new AI system. But be assured that a lot more tests will be done on different argument values.
 

dje

New Member
You may want to make sure that the AI and BI block sizes are the same.

As a side note, using AI almost always forces BI partial writes to increase. This is because when you write an AI block it has to make sure that any notes in the AI block to be written have been first written to the BI log, and because of the BI buffering that takes place when -Mf is not zero that might not be the case, so an AI write will not uncommonly be preceded by a BI write. Also, not all BI notes are copied to the AI log they often get out of sync.

If you've set the BI block size but not the AI block size then that exacerbates the problem. I once saw the situation where -biblocksize had been set to 16 and -aiblocksize had been left at 1, so each BI block was being written 16 times! Eugh.

As the transaction load increases you're more likely to have one or more full blocks on your BI buffer list, so partial writes caused by AI synchronisation issues shouldn't bother you so much provided the AI and BI block sizes are the same (or the AI block size is larger than the BI block size).

- David Eddy
 

ron

Member
dje said:
You may want to make sure that the AI and BI block sizes are the same.

- David Eddy

Thanks for that, Dave ("Junior Progress Talker" ... te, he, he!)

The AI and BI block sizes are both set to 16K ... -bibufs and -aibufs are both set to 50 (they were both 30 until a week back).

I understand what you're saying, Dave, and I certainly didn't expect the details for AI and BI to be identical. But I didn't expect them to be anywhere as different as they are, either!

I guess the bottom line is that you don't see anything "wrong" with the figures ... and I can accept that.

Thanks again,
Ron. :drink:
 

dje

New Member
Yes, I don't see anything 'wrong' with the figures.

But if the highish BI write rate bugs you, then perhaps you might try racking the BI block size back to 8kB. That should still give you enough BI write bandwidth to not bottleneck hopelessly during a billing run (note to non-Ron readers - I know the app he's running) but should net a useful decrease in partial writes.

This isn't a suggestion I'd make to just anybody but it seems to be worth a try in this case.

- David Eddy
 
Interestingly what you say means that increasing -Mf will have little or no effect, since it will be negated by the forced write of the BI block when the AI block is written.

In fact - increasing -Mf may make this worse.

Will the BI cluster size have any effect on this? I suspect so.

Going back to the faithful documentation on the groupdelay parameter, it may be helpful to reduce these to reduce the number of partial writes.

I doubt it will decrease the overall number of writes and in my personal experience it is very hard to achieve flooding of a SCSI bus on MB/s but it is not so hard to flood on the number of I/O per second....
 

dje

New Member
toby.Harman said:
Interestingly what you say means that increasing -Mf will have little or no effect, since it will be negated by the forced write of the BI block when the AI block is written.

Yes, that's correct.

toby.Harman said:
In fact - increasing -Mf may make this worse.

Well, as always, "it depends". Increasing -Mf demands increased -bibufs and -aibufs... and you are using both BIW and AIW, aren't you? It can be useful if the BI and AI disks are very busy. Otherwise it doesn't get you much.

toby.Harman said:
Will the BI cluster size have any effect on this? I suspect so.

I don't think so. Cluster size only affects the frequency of checkpoints. Now a checkpoint will force a flush of all full BI blocks, but that's about it as far as I can see. There is no equivalent of the checkpoint on the AI side.

toby.Harman said:
Going back to the faithful documentation on the groupdelay parameter, it may be helpful to reduce these to reduce the number of partial writes.

Huh? -groupdelay is only used if -Mf is 0, so I don't see the relevance here.

toby.Harman said:
I doubt it will decrease the overall number of writes and in my personal experience it is very hard to achieve flooding of a SCSI bus on MB/s but it is not so hard to flood on the number of I/O per second....

Well, I guess - but you're far more likely to flood the disk before you flood the bus.

- David Eddy
 
dje said:
Huh? -groupdelay is only used if -Mf is 0, so I don't see the relevance here.

- David Eddy

I was referring to the documentation about groupdelay which refers to reducing the BI block size and cluster size since it can actually help performance !

dje said:
Well, I guess - but you're far more likely to flood the disk before you flood the bus.

- David Eddy

Yes - apologies for the lack of clarity! I blame it on lack of sleep and that I blame on my children (and the australian censors view of Buffy and Angel)!
as-sleep.gif
 
Top